readme: add some interesting details about umT5
Browse files
README.md
CHANGED
@@ -4,6 +4,23 @@ license: mit
|
|
4 |
|
5 |
# umT5 Small
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
Overview of umT5 model integration:
|
8 |
|
9 |
* Transformers Integration is on-going, see this awesome [PR](https://github.com/huggingface/transformers/pull/22626) by @agemagician!
|
|
|
4 |
|
5 |
# umT5 Small
|
6 |
|
7 |
+
The UMT5 model was proposed in [UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining](https://openreview.net/forum?id=kXwdL1cWOAi)
|
8 |
+
by Hyung Won Chung, Xavier Garcia, Adam Roberts, Yi Tay, Orhan Firat, Sharan Narang, Noah Constant.
|
9 |
+
|
10 |
+
The abstract from the paper is the following:
|
11 |
+
|
12 |
+
*Pretrained multilingual large language models have typically used heuristic temperature-based sampling to balance
|
13 |
+
between different languages. However previous work has not systematically evaluated the efficacy of different
|
14 |
+
pretraining language distributions across model scales. In this paper, we propose a new sampling method, UniMax,
|
15 |
+
that delivers more uniform coverage of head languages while mitigating overfitting on tail languages by explicitly
|
16 |
+
capping the number of repeats over each language's corpus. We perform an extensive series of ablations testing a
|
17 |
+
range of sampling strategies on a suite of multilingual benchmarks, while varying model scale. We find that UniMax
|
18 |
+
outperforms standard temperature-based sampling, and the benefits persist as scale increases. As part of our
|
19 |
+
contribution, we release: (i) an improved and refreshed mC4 multilingual corpus consisting of 29 trillion characters
|
20 |
+
across 107 languages, and (ii) a suite of pretrained umT5 model checkpoints trained with UniMax sampling.*
|
21 |
+
|
22 |
+
# Integration into Transformers
|
23 |
+
|
24 |
Overview of umT5 model integration:
|
25 |
|
26 |
* Transformers Integration is on-going, see this awesome [PR](https://github.com/huggingface/transformers/pull/22626) by @agemagician!
|