---
language:
- en
- ko
license: llama3
library_name: transformers
datasets:
- legacy-datasets/wikipedia
pipeline_tag: text-generation
---
## Model Details
This model was continually pretrained from the [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B), using English and Korean datasets.
The goal is to enhance its proficiency in Korean while maintaining its English language capabilities from the original model.
### Datasets
We sampled 16B tokens from the following datasets for training:
Sources
|
Tokens (Llama-3-8B)
|
AI-Hub
|
9.2B
|
Modu Corpus
|
5.8B
|
Wikipedia
|
5.4B
|
### Hyperparameters
Learning rate |
Optimizer |
Betas |
Weight decay |
Warm-up ratio |
3e-5 |
AdamW |
(0.9, 0.95) |
0.1 |
0.05 |
## Intended Use
This model has not been fine-tuned, so you will need to train it on your own dataset before using it.
## Evaluations
We evaluated this model using both English and Korean benchmarks, and compared it with similar models that were continually pretrained from the [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B).
|
English |
Korean |
Model |
MMLU (5-shot) |
HellaSwag (10-shot) |
GSM8K (8-shot, CoT) |
BBH (3-shot, CoT) |
KMMLU (5-shot) |
HAE-RAE (5-shot) |
KoBEST (5-shot) |
meta-llama/Meta-Llama-3-8B |
65.1 |
82.1 |
52.0 |
61.9 |
40.2 |
61.1 |
69.2 |
saltlux/Ko-Llama3-Luxia-8B |
57.1 |
77.1 |
32.3 |
51.8 |
39.4 |
69.2 |
71.9 |
beomi/Llama-3-Open-Ko-8B |
56.2 |
77.4 |
31.5 |
46.8 |
40.3 |
68.1 |
72.1 |
beomi/Llama-3-KoEn-8B |
52.5 |
77.7 |
21.2 |
43.2 |
40.8 |
71.3 |
73.8 |
tesser-ai/Tesser-Llama-3-Ko-8B |
60.5 |
79.8 |
40.3 |
56.3 |
42.5 |
72.1 |
73.8 |
## Limitations
We trained this model using a context length of 4k due to resource limitations and to maximize training speed.
However, the original model was trained with a context length of 8k, so an 8k context length could work well in downstream tasks.
## License
This model follows the original [Llama-3 license](https://llama.meta.com/llama3/license/).