This is roberta-base not distilled
Hi
I found that the size of this model is the same as xlm-roberta-base model.
What do you mean by "distilled" in the name of this model?
This model was distilled from the deepset/xlm-roberta-large-squad2 model.
The number of params is the same as xlm-roberta!
Both has 278,084,405.
Where on the other hand, monolingual Roberta and its distilled:
RoBERTa: 124,686,389
DistRoBERTa: 82,159,157
Am I missing something here?
Hi, @bilalghanem I believe there is some confusion.
This model was distilled from the xlm-roberta-large-squad2 which is the same size as xlm-roberta-large model:
xlm-roberta-large-squad2: pytorch weights are 2.24 GB link
xlm-roberta-large: pytorch weights are 2.24 GB link
So this model (xlm-roberta-base-squad2-distilled) was made by distilling the large model into a base model so we expect this model (xlm-roberta-base-squad2-distilled) to be the same size as xlm-roberta-base model.
xlm-roberta-base-squad2-distilled: pytorch weights are 1.11 GB link
xlm-roberta-base-squad2: pytorch weights are 1.11 GB link
xlm-roberta-base: pytorch weights are 1.12 GB link
Let me know if this answers your question and you can learn more about model distillation on our blog: Model Distillation with Haystack
I see, thanks for the clarification!
Then I'd suggest you to name it deepset/xlm-roberta-large-squad2-distilled
, not base.
Because the way the models are named in huggingface is different, e.g. distilbert-base-uncased
is a distilled model of bert-base-uncased
, not `bert-large-uncased.
Thanks!