metadata
license: apache-2.0
mBERT swedish distileld base model (cased)
This model is a distilled version of mBERT. It was distilled using Swedish data, the 2010-2015 portion of the Swedish Culturomics Gigaword Corpus. The code for the distillation process can be found here. This was done as part of my Master's Thesis: Task-agnostic knowledge distillation of mBERT to Swedish.
Model description
This is a 6-layer version of mBERT, having been distilled using the LightMBERT distillation method, but without freezing the embedding layer.
Intended uses & limitations
You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task.