|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
datasets: |
|
- liswei/zhtw-news-and-articles-2B |
|
base_model: apple/OpenELM-270M |
|
language: |
|
- zh |
|
metrics: |
|
- perplexity |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Model Card for Chinese-OpenELM-270M |
|
|
|
Continual pre-trained from [apple/OpenELM-270M](https://huggingface.co/apple/OpenELM-270M) with [liswei/zhtw-news-and-articles-2B](https://huggingface.co/datasets/liswei/zhtw-news-and-articles-2B): |
|
|
|
* Extended vocabulary from 32000 to 61758 tokens with additional Traditional Chinese characters. |
|
* Tokenizer is trained on [liswei/zhtw-news-and-articles-2B](https://huggingface.co/datasets/liswei/zhtw-news-and-articles-2B) and pruned from 96000 to 61758 tokens while maintaining 95% coverage on the pre-training dataset. |
|
* Additional token embeddings are initialized with the mean vector of existing embeddings. |
|
* Traditional Chinese perplexity = 1.6871 on held-out evaluation dataset. |
|
* Applied [GaLore](https://arxiv.org/abs/2403.03507) for efficient training with following hyperparameters: |
|
* Rank: 1024 |
|
* Scale: 4.0 |
|
* Update interval: 200 |
|
* Layer-wise training: False |