--- language: - en - ko license: llama3 library_name: transformers datasets: - legacy-datasets/wikipedia pipeline_tag: text-generation --- ## Model Details This model was continually pretrained from the [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B), using English and Korean datasets. The goal is to enhance its proficiency in Korean while maintaining its English language capabilities from the original model. ### Datasets We sampled 16B tokens from the following datasets for training:

Sources	Tokens (Llama-3-8B)
AI-Hub	9.2B
Modu Corpus	5.8B
Wikipedia	5.4B

### Hyperparameters

Learning rate	Optimizer	Betas	Weight decay	Warm-up ratio
3e-5	AdamW	(0.9, 0.95)	0.1	0.05

## Intended Use This model has not been fine-tuned, so you will need to train it on your own dataset before using it. ## Evaluations We evaluated this model using both English and Korean benchmarks, and compared it with similar models that were continually pretrained from the [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B).

	English				Korean
Model	MMLU (5-shot)	HellaSwag (10-shot)	GSM8K (8-shot, CoT)	BBH (3-shot, CoT)	KMMLU (5-shot)	HAE-RAE (5-shot)	KoBEST (5-shot)
meta-llama/Meta-Llama-3-8B	65.1	82.1	52.0	61.9	40.2	61.1	69.2
saltlux/Ko-Llama3-Luxia-8B	57.1	77.1	32.3	51.8	39.4	69.2	71.9
beomi/Llama-3-Open-Ko-8B	56.2	77.4	31.5	46.8	40.3	68.1	72.1
beomi/Llama-3-KoEn-8B	52.5	77.7	21.2	43.2	40.8	71.3	73.8
tesser-ai/Tesser-Llama-3-Ko-8B	60.5	79.8	40.3	56.3	42.5	72.1	73.8

## Limitations We trained this model using a context length of 4k due to resource limitations and to maximize training speed. However, the original model was trained with a context length of 8k, so an 8k context length could work well in downstream tasks. ## License This model follows the original [Llama-3 license](https://llama.meta.com/llama3/license/).