BERT-of-Theseus
See our paper "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing".
BERT-of-Theseus is a new compressed BERT by progressively replacing the components of the original BERT.
Load Pretrained Model on MNLI
We provide a 6-layer pretrained model on MNLI as a general-purpose model, which can transfer to other sentence classification tasks, outperforming DistillBERT (with the same 6-layer structure) on six tasks of GLUE (dev set).
Method | MNLI | MRPC | QNLI | QQP | RTE | SST-2 | STS-B |
---|---|---|---|---|---|---|---|
BERT-base | 83.5 | 89.5 | 91.2 | 89.8 | 71.1 | 91.5 | 88.9 |
DistillBERT | 79.0 | 87.5 | 85.3 | 84.9 | 59.9 | 90.7 | 81.2 |
BERT-of-Theseus | 82.1 | 87.5 | 88.8 | 88.8 | 70.1 | 91.8 | 87.8 |
Please Note: this checkpoint is for Intermediate-Task Transfer Learning so it does not include the classification head for MNLI! Please fine-tune it before use (like DistilBERT).
- Downloads last month
- 292
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.