Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,8 @@ license: apache-2.0
|
|
13 |
# Tiny BERT December 2022
|
14 |
|
15 |
This is a more up-to-date version of the [original tiny BERT](https://huggingface.co/google/bert_uncased_L-2_H-128_A-2) referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962) (English only, uncased, trained with WordPiece masking).
|
16 |
-
In addition to being more up-to-date, it is more CPU friendly than its base version, but its first version and is not perfect by no means.
|
|
|
17 |
|
18 |
|
19 |
The model was trained on a cleaned December 2022 snapshot of Common Crawl and Wikipedia.
|
@@ -45,8 +46,8 @@ OLM
|
|
45 |
65825874694874, 'qnli_acc': 0.6199890170236134, 'rte_acc': 0.5595667870036101, 'wnli_acc': 0.5352112676056338}
|
46 |
```
|
47 |
|
48 |
-
Probably messed up with hyperparameters and tokenizer a bit, unfortunately. Stay tuned for version 2 πππ
|
49 |
-
|
50 |
|
51 |
## Dataset
|
52 |
|
|
|
13 |
# Tiny BERT December 2022
|
14 |
|
15 |
This is a more up-to-date version of the [original tiny BERT](https://huggingface.co/google/bert_uncased_L-2_H-128_A-2) referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962) (English only, uncased, trained with WordPiece masking).
|
16 |
+
In addition to being more up-to-date, it is more CPU friendly than its base version, but its first version and is not perfect by no means. Took a day and 8x A100s to train. π€
|
17 |
+
|
18 |
|
19 |
|
20 |
The model was trained on a cleaned December 2022 snapshot of Common Crawl and Wikipedia.
|
|
|
46 |
65825874694874, 'qnli_acc': 0.6199890170236134, 'rte_acc': 0.5595667870036101, 'wnli_acc': 0.5352112676056338}
|
47 |
```
|
48 |
|
49 |
+
Probably messed up with hyperparameters and tokenizer a bit, unfortunately. Anyway Stay tuned for version 2 πππ
|
50 |
+
But please try it out on your downstream tasks, might be more performant. Should be cheap to fine-tune due to its size π€
|
51 |
|
52 |
## Dataset
|
53 |
|