tokyotech-llm
/

Llama-3-Swallow-8B-v0.1

+---
+language:
+  - en
+  - ja
+library_name: transformers
+pipeline_tag: text-generation
+license: llama3
+model_type: llama
+---
+# Llama3 Swallow
+Our Swallow model has undergone continual pre-training from the [Llama 3 family](https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6), primarily with the addition of Japanese language data. The Instruct versions use supervised fine-tuning (SFT) and Chat Vector. Links to other models can be found in the index.
+# Model Release Updates
+We are excited to share the release schedule for our latest models:
+- **July 1, 2024**: Released the [Llama-3-Swallow-8B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-v0.1), [Llama-3-Swallow-8B-Instruct-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1), [Llama-3-Swallow-70B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-v0.1), and [Llama-3-Swallow-70B-Instruct-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1).
+## Swallow Model Index
+|Model|Llama-3-Swallow|Llama3 Swallow instruct|
+|---|---|---|
+|8B| [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-v0.1) | [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1) |
+|70B| [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-v0.1) | [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1) |
+![logo](./logo.png)
+This repository provides large language models developed by [Swallow-LLM](https://swallow-llm.github.io/).
+Read our [blog post](https://zenn.dev/tokyotech_lm/articles/f65989d76baf2c).
+## Model Details
+* **Model type**: Please refer to [Llama 3 MODEL_CARD](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for details on the model architecture.
+* **Language(s)**: Japanese English
+* **Library**: [Megatron-LM](https://github.com/NVIDIA/Megatron-LM)
+* **Tokenizer**: Please refer to [Llama 3 blog](https://ai.meta.com/blog/meta-llama-3/) for details on the tokenizer.
+* **Contact**: swallow[at]nlp.c.titech.ac.jp
+## Model Performance
+### Japanese tasks
+**https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-v0.1 にのみ載せる**
+|Model|Size|JCom.|JEMHopQA|NIILC|JSQuAD|XL-Sum|MGSM|WMT20-en-ja|WMT20-ja-en|JMMLU|JHumanEval|Ja Avg|
+|---|---|---|---|---|---|---|---|---|---|---|---|---|
+|   |   |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot|   |
+|   |   |EM acc|Char-F1|Char-F1|Char-F1|ROUGE-2|EM acc|BLEU|BLEU|EM acc|pass@1|   |
+|Llama-2-7b|7B|0.2618|0.4914|0.3301|0.8001|0.1742|0.0560|0.1764|0.1742|0.2824|0.1250|0.2872|
+|Swallow-7b-hf|7B|0.4888|0.5044|**0.5925**|0.8424|0.1823|0.1240|0.2505|0.1482|0.3219|0.0183|0.3473|
+|Mistral-7B-v0.1|7B|0.7471|0.4482|0.2691|0.8588|0.2026|0.1880|0.1430|0.1738|0.4213|0.2598|0.3712|
+|Swallow-MS-7b-v0.1|7B|0.8758|**0.5153**|0.5647|0.8762|0.1993|0.2400|0.2507|0.1667|0.4527|0.2335|0.4375|
+|Qwen2-7B|7B|0.8776|0.4627|0.3766|**0.8984**|0.1716|**0.5480**|0.2080|0.1949|**0.5871**|**0.4183**|**0.4805**|
+|Meta-Llama-3-8B|8B|0.8356|0.4454|0.4002|0.8881|0.1757|0.3320|0.2199|0.2087|0.4558|0.3311|0.4292|
+|llama-3-youko-8b|8B|0.8660|0.4902|0.5155|0.8947|**0.2127**|0.2840|0.2740|0.2180|0.4493|0.2183|0.4423|
+|Llama-3-Swallow-8B-v0.1|8B|**0.8945**|0.4848|0.5640|0.8947|0.1981|0.4240|**0.2758**|**0.2223**|0.4699|0.2890|0.4717|
+### English tasks
+|Model|Size|OpenBookQA|TriviaQA|HellaSWAG|SQuAD2.0|XWINO|MMLU|GSM8K|BBH|HumanEval|En Avg|
+|---|---|---|---|---|---|---|---|---|---|---|---|
+|   |   |4-shot|4-shot|4-shot|4-shot|4-shot|5-shot|4-shot|3-shot|0-shot|   |
+|   |   |Acc|EM acc|Acc|EM acc|Acc|Acc|EM acc|CoT EM Acc|pass@1|   |
+|Llama-2-7b|7B|0.3720|0.6385|0.5826|0.2911|0.9045|0.4590|0.1266|0.3993|0.1354|0.4343|
+|Swallow-7b-hf|7B|0.3080|0.4921|0.5269|0.2608|0.8847|0.3918|0.0963|0.3531|0.0402|0.3727|
+|Mistral-7B-v0.1|7B|0.3740|0.7030|**0.6260**|0.3381|**0.9067**|0.6236|0.3851|0.5597|0.2841|0.5334|
+|Swallow-MS-7b-v0.1|7B|0.3480|0.5995|0.5798|0.3011|0.9015|0.5486|0.2669|0.4916|0.2732|0.4789|
+|Qwen2-7B|7B|0.3740|0.6105|0.6006|**0.3623**|0.8916|**0.7045**|**0.7748**|0.5325|**0.4622**|**0.5903**|
+|Meta-Llama-3-8B|8B|**0.3760**|**0.7109**|0.6124|0.3356|0.9032|0.6509|0.4936|**0.6211**|0.3793|0.5648|
+|llama-3-youko-8b|8B|0.3500|0.6252|0.5885|0.3247|0.8959|0.5993|0.3571|0.5704|0.2793|0.5100|
+|Llama-3-Swallow-8B-v0.1|8B|0.3520|0.6563|0.5901|0.3507|0.9006|0.6152|0.4875|0.5936|0.3323|0.5420|
+## Evaluation Benchmarks
+### Japanese evaluation benchmarks
+We used llm-jp-eval(v1.3.0), JP Language Model Evaluation Harness(commit #9b42d41) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:
+- Multiple-choice question answering (JCommonsenseQA [Kurihara et al., 2022])
+- Open-ended question answering (JEMHopQA [Ishii et al., 2024])
+- Open-ended question answering (NIILC [関根, 2003])
+- Machine reading comprehension (JSQuAD [Kurihara et al., 2022])
+- Automatic summarization (XL-Sum [Hasan et al., 2021])
+- Machine translation (WMT2020 ja-en [Barrault et al., 2020])
+- Machine translation (WMT2020 en-ja [Barrault et al., 2020])
+- Mathematical reasoning (MGSM [Shi et al., 2023])
+- Academic exams (JMMLU [尹ら, 2024])
+- Code generation (JHumanEval [佐藤ら, 2024])
+### English evaluation benchmarks
+We used the Language Model Evaluation Harness(v.0.4.2) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows:
+- Multiple-choice question answering (OpenBookQA [Mihaylov et al., 2018])
+- Open-ended question answering (TriviaQA [Joshi et al., 2017])
+- Machine reading comprehension (SQuAD2 [Rajpurkar et al., 2018])
+- Commonsense reasoning (XWINO [Tikhonov and Ryabinin, 2021])
+- Natural language inference (HellaSwag [Zellers et al., 2019])
+- Mathematical reasoning (GSM8K [Cobbe et al., 2021])
+- Reasoning (BBH (BIG-Bench-Hard) [Suzgun et al., 2023])
+- Academic exams (MMLU [Hendrycks et al., 2021])
+- Code generation (HumanEval [Chen et al., 2021])
+## Training Datasets
+### Continual Pre-Training
+The following datasets were used for continual pre-training.
+- [Algebraic Stack](https://huggingface.co/datasets/EleutherAI/proof-pile-2)
+- [Cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia)
+- [English Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
+- [Japanese Wikipedia](https://dumps.wikimedia.org/other/cirrussearch)
+- [Laboro ParaCorpus](https://github.com/laboroai/Laboro-ParaCorpus)
+- [OpenWebMath](https://huggingface.co/datasets/EleutherAI/proof-pile-2)
+- [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)
+- [Swallow Corpus](https://arxiv.org/abs/2404.17733)
+## Risks and Limitations
+The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.
+## Acknowledgements
+We thank Meta Research for releasing Llama 3 under an open license for others to build on.
+Our project is supported by the [Large Generative AI Development Support Program](https://abci.ai/en/link/lfm_support_program.html) of the National Institute of Advanced Industrial Science and Technology.
+## License
+[META LLAMA 3 COMMUNITY LICENSE](https://llama.meta.com/llama3/license/)
+## Authors
+Here are the team members:
+- From [Tokyo Institute of Technology Okazaki Laboratory](https://www.nlp.c.titech.ac.jp/index.en.html), the following members:
+  - [Naoaki Okazaki](https://www.chokkan.org/index.ja.html)
+  - [Sakae Mizuki](https://s-mizuki-nlp.github.io/)
+  - [Youmi Ma](https://www.nlp.c.titech.ac.jp/member/youmi.en.html)
+  - [Koki Maeda](https://sites.google.com/view/silviase)
+  - [Kakeru Hattori](https://aya-se.vercel.app/)
+  - [Masanari Ohi](https://sites.google.com/view/masanariohi)
+  - [Taihei Shiotani](https://github.com/inatoihs)
+  - [Koshiro Saito](https://sites.google.com/view/koshiro-saito)
+- From [Tokyo Institute of Technology YOKOTA Laboratory](https://www.rio.gsic.titech.ac.jp/en/index.html), the following members:
+  - [Rio Yokota](https://twitter.com/rioyokota)
+  - [Kazuki Fujii](https://twitter.com/okoge_kaz)
+  - [Taishi Nakamura](https://twitter.com/Setuna7777_2)
+  - [Takumi Okamoto](https://www.linkedin.com/in/takumi-okamoto)
+  - [Ishida Shigeki](https://www.wantedly.com/id/reborn27)
+- From [Artificial Intelligence Research Center, AIST, Japan](https://www.airc.aist.go.jp/en/teams/), the following members:
+  - [Hiroya Takamura](https://sites.google.com/view/hjtakamura)
+## How to Cite
+If you find our work helpful, please feel free to cite us.
+```tex
+@misc{llama3swallow,
+      title={Llama 3 Swallow},
+      url={https://swallow-llm.github.io/llama3-swallow.en.html},
+      author={Swallow LLM},
+      year={2024},
+}
+```
+### Citations
+```tex
+@article{llama3modelcard,
+    title={Llama 3 Model Card},
+    author={AI@Meta},
+    year={2024},
+    url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
+}
+```