muhtasham
/

olm-bert-tiny-december-2022

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

muhtasham commited on Feb 5, 2023

Commit

2f3d561

•

1 Parent(s): bd7b6e7

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -13,7 +13,8 @@ license: apache-2.0
 # Tiny BERT December 2022
 This is a more up-to-date version of the [original tiny BERT](https://huggingface.co/google/bert_uncased_L-2_H-128_A-2) referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962) (English only, uncased, trained with WordPiece masking).
-In addition to being more up-to-date, it is more CPU friendly than its base version, but its first version and is not perfect by no means.
 The model was trained on a cleaned December 2022 snapshot of Common Crawl and Wikipedia.
@@ -45,8 +46,8 @@ OLM
 65825874694874, 'qnli_acc': 0.6199890170236134, 'rte_acc': 0.5595667870036101, 'wnli_acc': 0.5352112676056338}
 ```
-Probably messed up with hyperparameters and tokenizer a bit, unfortunately. Stay tuned for version 2 🚀🚀🚀
 ## Dataset

 # Tiny BERT December 2022
 This is a more up-to-date version of the [original tiny BERT](https://huggingface.co/google/bert_uncased_L-2_H-128_A-2) referenced in [Well-Read Students Learn Better: On the Importance of Pre-training Compact Models](https://arxiv.org/abs/1908.08962) (English only, uncased, trained with WordPiece masking).
+In addition to being more up-to-date, it is more CPU friendly than its base version, but its first version and is not perfect by no means. Took a day and 8x A100s to train. 🤗
 The model was trained on a cleaned December 2022 snapshot of Common Crawl and Wikipedia.
 65825874694874, 'qnli_acc': 0.6199890170236134, 'rte_acc': 0.5595667870036101, 'wnli_acc': 0.5352112676056338}
 ```
+Probably messed up with hyperparameters and tokenizer a bit, unfortunately. Anyway Stay tuned for version 2 🚀🚀🚀
+But please try it out on your downstream tasks, might be more performant. Should be cheap to fine-tune due to its size 🤗
 ## Dataset