File size: 3,243 Bytes
36fff89 9c7b8dd 36fff89 9c7b8dd 36fff89 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
language:
- en
- ko
license: llama3
library_name: transformers
datasets:
- legacy-datasets/wikipedia
pipeline_tag: text-generation
---
## Model Details
This model was continually pretrained from the [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B), using English and Korean datasets.
The goal is to enhance its proficiency in Korean while maintaining its English language capabilities from the original model.
### Datasets
We sampled 16B tokens from the following datasets for training:
<table>
<tr>
<td><strong>Sources</strong>
</td>
<td><strong>Tokens (Llama-3-8B)</strong>
</td>
</tr>
<tr>
<td>AI-Hub
</td>
<td>9.2B
</td>
</tr>
<tr>
<td>Modu Corpus
</td>
<td>5.8B
</td>
</tr>
<tr>
<td>Wikipedia
</td>
<td>5.4B
</td>
</tr>
</table>
### Hyperparameters
<table>
<tr>
<td><strong>Learning rate</strong></td>
<td><strong>Optimizer</strong></td>
<td><strong>Betas</strong></td>
<td><strong>Weight decay</strong></td>
<td><strong>Warm-up ratio</strong></td>
</tr>
<tr>
<td>3e-5</td>
<td>AdamW</td>
<td>(0.9, 0.95)</td>
<td>0.1</td>
<td>0.05</td>
</tr>
</table>
## Intended Use
This model has not been fine-tuned, so you will need to train it on your own dataset before using it.
## Evaluations
We evaluated this model using both English and Korean benchmarks, and compared it with similar models that were continually pretrained from the [Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B).
<table>
<tr>
<td></td>
<td colspan="4"><strong>English</strong></td>
<td colspan="3"><strong>Korean</strong></td>
</tr>
<tr>
<td><strong>Model</strong></td>
<td><strong>MMLU (5 shots)</strong></td>
<td><strong>HellaSwag (10 shots)</strong></td>
<td><strong>GSM8K (8 shots, CoT)</strong></td>
<td><strong>BBH (3 shots, CoT)</strong></td>
<td><strong>KMMLU (5 shots)</strong></td>
<td><strong>HAE-RAE (5 shots)</strong></td>
<td><strong>KoBEST (5 shots)</strong></td>
</tr>
<tr>
<td>meta-llama/Meta-Llama-3-8B</td>
<td><strong>65.1</strong></td>
<td><strong>82.1</strong></td>
<td><strong>52.0</strong></td>
<td><strong>61.9</strong></td>
<td>40.2</td>
<td>61.1</td>
<td>69.2</td>
</tr>
<tr>
<td>saltlux/Ko-Llama3-Luxia-8B</td>
<td>57.1</td>
<td>77.1</td>
<td>32.3</td>
<td>51.8</td>
<td>39.4</td>
<td>69.2</td>
<td>71.9</td>
</tr>
<tr>
<td>beomi/Llama-3-Open-Ko-8B</td>
<td>56.2</td>
<td>77.4</td>
<td>31.5</td>
<td>46.8</td>
<td>40.3</td>
<td>68.1</td>
<td><u>72.1</u></td>
</tr>
<tr>
<td>beomi/Llama-3-KoEn-8B</td>
<td>52.5</td>
<td>77.7</td>
<td>21.2</td>
<td>43.2</td>
<td><u>40.8</u></td>
<td><u>71.3</u></td>
<td><strong>73.8</strong></td>
</tr>
<tr>
<td><strong>tesser/Tesser-Llama-3-Ko-8B</strong></td>
<td><u>60.5</u></td>
<td><u>79.8</u></td>
<td><u>40.3</u></td>
<td><u>56.3</u></td>
<td><strong>42.5</strong></td>
<td><strong>72.1</strong></td>
<td><strong>73.8</strong></td>
</tr>
</table>
## License
This model follows the original [Llama-3 license](https://llama.meta.com/llama3/license/). |