Add training data and summary of evaluation results.
Browse files
README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
-
# mBERT swedish
|
6 |
|
7 |
This model is a distilled version of [mBERT](https://huggingface.co/bert-base-multilingual-cased). It was distilled using Swedish data, the 2010-2015 portion of the [Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/en/resources/gigaword). The code for the distillation process can be found [here](https://github.com/AddedK/swedish-mbert-distillation/blob/main/azureML/pretrain_distillation.py). This was done as part of my Master's Thesis: *Task-agnostic knowledge distillation of mBERT to Swedish*.
|
8 |
|
@@ -14,3 +14,18 @@ This is a 6-layer version of mBERT, having been distilled using the [LightMBERT]
|
|
14 |
## Intended uses & limitations
|
15 |
You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
|
16 |
be fine-tuned on a downstream task.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
# mBERT swedish distilled base model (cased)
|
6 |
|
7 |
This model is a distilled version of [mBERT](https://huggingface.co/bert-base-multilingual-cased). It was distilled using Swedish data, the 2010-2015 portion of the [Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/en/resources/gigaword). The code for the distillation process can be found [here](https://github.com/AddedK/swedish-mbert-distillation/blob/main/azureML/pretrain_distillation.py). This was done as part of my Master's Thesis: *Task-agnostic knowledge distillation of mBERT to Swedish*.
|
8 |
|
|
|
14 |
## Intended uses & limitations
|
15 |
You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
|
16 |
be fine-tuned on a downstream task.
|
17 |
+
|
18 |
+
|
19 |
+
## Training data
|
20 |
+
|
21 |
+
The data used for distillation was the 2010-2015 portion of the [Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/en/resources/gigaword).
|
22 |
+
The tokenized data had a file size of approximately 9 GB.
|
23 |
+
|
24 |
+
## Evaluation results
|
25 |
+
|
26 |
+
When evaluated on the [SUCX 3.0 ](https://huggingface.co/datasets/KBLab/sucx3_ner) dataset, it achieved an average F1 score of 0.859 which is competitive with the score mBERT obtained, 0.866.
|
27 |
+
|
28 |
+
When evaluated on the [English WikiANN](https://huggingface.co/datasets/wikiann) dataset, it achieved an average F1 score of 0.826 which is competitive with the score mBERT obtained, 0.849.
|
29 |
+
|
30 |
+
Additional results and comparisons are presented in my Master's Thesis
|
31 |
+
|