metadata

license: apache-2.0

mBERT swedish distileld base model (cased)

This model is a distilled version of mBERT. It was distilled using Swedish data, the 2010-2015 portion of the Swedish Culturomics Gigaword Corpus. The code for the distillation process can be found here. This was done as part of my Master's Thesis: Task-agnostic knowledge distillation of mBERT to Swedish.

Model description

This is a 6-layer version of mBERT, having been distilled using the LightMBERT distillation method, but without freezing the embedding layer.

Intended uses & limitations

You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task.