Addedk commited on
Commit
674121e
1 Parent(s): 3a96032

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -1,3 +1,29 @@
1
  ---
 
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: Swedish
3
  license: apache-2.0
4
  ---
5
+
6
+ # KB-BERT distilled base model (cased)
7
+
8
+ This model is a distilled version of [KB-BERT](https://huggingface.co/KB/bert-base-swedish-cased). It was distilled using Swedish data, the 2010-2015 portion of the [Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/en/resources/gigaword). The code for the distillation process can be found [here](https://github.com/AddedK/swedish-mbert-distillation/blob/main/azureML/pretrain_distillation.py). This was done as part of my Master's Thesis: *Task-agnostic knowledge distillation of mBERT to Swedish*.
9
+
10
+
11
+ ## Model description
12
+ This is a 6-layer version of KB-BERT, having been distilled using the [LightMBERT](https://arxiv.org/abs/2103.06418) distillation method, but without freezing the embedding layer.
13
+
14
+
15
+ ## Intended uses & limitations
16
+ You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
17
+ be fine-tuned on a downstream task.
18
+
19
+
20
+ ## Training data
21
+
22
+ The data used for distillation was the 2010-2015 portion of the [Swedish Culturomics Gigaword Corpus](https://spraakbanken.gu.se/en/resources/gigaword).
23
+ The tokenized data had a file size of approximately 7.4 GB.
24
+
25
+ ## Evaluation results
26
+
27
+ When evaluated on the [SUCX 3.0 ](https://huggingface.co/datasets/KBLab/sucx3_ner) dataset, it achieved an average F1 score of 0.887 which is competitive with the score KB-BERT obtained, 0.894.
28
+
29
+ Additional results and comparisons are presented in my Master's Thesis