nlpie
/

bio-miniALBERT-128

Inference Endpoints

Model card Files Files and versions Community

mohammadmahdinouri commited on Feb 11, 2023

Commit

b45479f

•

1 Parent(s): dc94ecc

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ license: mit
 # Model
 miniALBERT is a recursive transformer model which uses cross-layer parameter sharing, embedding factorisation, and bottleneck adapters to achieve high parameter efficiency.
-Since miniALBERT is a compact model, it is trained using a layer-to-layer distillation technique, using the bert-base model as the teacher. Currently, this model is trained for one 100K steps on the PubMed Abstracts dataset.
 In terms of architecture, this model uses an embedding dimension of 128, a hidden size of 768, an MLP expansion rate of 4, and a reduction factor of 16 for bottleneck adapters. In general, this model uses 6 recursions and has a unique parameter count of 11 million parameters.
 # Usage

 # Model
 miniALBERT is a recursive transformer model which uses cross-layer parameter sharing, embedding factorisation, and bottleneck adapters to achieve high parameter efficiency.
+Since miniALBERT is a compact model, it is trained using a layer-to-layer distillation technique, using the BioBERT-v1.1 model as the teacher. Currently, this model is trained for 100K steps on the PubMed Abstracts dataset.
 In terms of architecture, this model uses an embedding dimension of 128, a hidden size of 768, an MLP expansion rate of 4, and a reduction factor of 16 for bottleneck adapters. In general, this model uses 6 recursions and has a unique parameter count of 11 million parameters.
 # Usage