liswei
/

Taiwan-ELM-270M

Text Generation

Model card Files Files and versions Metrics Training metrics Community

Taiwan-ELM-270M / README.md

liswei's picture

Update README.md

92ec36a verified 5 months ago

|

No virus

1.12 kB

	---
	library_name: transformers
	license: apache-2.0
	datasets:
	- liswei/zhtw-news-and-articles-2B
	base_model: apple/OpenELM-270M
	language:
	- zh
	metrics:
	- perplexity
	pipeline_tag: text-generation
	---

	# Model Card for Chinese-OpenELM-270M

	Continual pre-trained from [apple/OpenELM-270M](https://huggingface.co/apple/OpenELM-270M) with [liswei/zhtw-news-and-articles-2B](https://huggingface.co/datasets/liswei/zhtw-news-and-articles-2B):

	* Extended vocabulary from 32000 to 61758 tokens with additional Traditional Chinese characters.
	* Tokenizer is trained on [liswei/zhtw-news-and-articles-2B](https://huggingface.co/datasets/liswei/zhtw-news-and-articles-2B) and pruned from 96000 to 61758 tokens while maintaining 95% coverage on the pre-training dataset.
	* Additional token embeddings are initialized with the mean vector of existing embeddings.
	* Traditional Chinese perplexity = 1.6871 on held-out evaluation dataset.
	* Applied [GaLore](https://arxiv.org/abs/2403.03507) for efficient training with following hyperparameters:
	* Rank: 1024
	* Scale: 4.0
	* Update interval: 200
	* Layer-wise training: False