|
--- |
|
language: |
|
- en |
|
- ja |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
license: llama3 |
|
model_type: llama |
|
--- |
|
|
|
# Llama3 Swallow |
|
|
|
Our Swallow model has undergone continual pre-training from the [Llama 3 family](https://huggingface.co/collections/meta-llama/meta-llama-3-66214712577ca38149ebb2b6), primarily with the addition of Japanese language data. The Instruct versions use supervised fine-tuning (SFT) and Chat Vector. Links to other models can be found in the index. |
|
|
|
|
|
# Model Release Updates |
|
|
|
We are excited to share the release schedule for our latest models: |
|
- **July 1, 2024**: Released the [Llama-3-Swallow-8B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-v0.1), [Llama-3-Swallow-8B-Instruct-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1), [Llama-3-Swallow-70B-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-v0.1), and [Llama-3-Swallow-70B-Instruct-v0.1](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1). |
|
|
|
## Swallow Model Index |
|
|
|
|Model|Llama-3-Swallow|Llama3 Swallow instruct| |
|
|---|---|---| |
|
|8B| [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-v0.1) | [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1) | |
|
|70B| [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-v0.1) | [Link](https://huggingface.co/tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1) | |
|
|
|
![logo](./logo.png) |
|
|
|
This repository provides large language models developed by [Swallow-LLM](https://swallow-llm.github.io/). |
|
Read our [blog post](https://zenn.dev/tokyotech_lm/articles/f65989d76baf2c). |
|
|
|
## Model Details |
|
|
|
* **Model type**: Please refer to [Llama 3 MODEL_CARD](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for details on the model architecture. |
|
* **Language(s)**: Japanese English |
|
* **Library**: [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) |
|
* **Tokenizer**: Please refer to [Llama 3 blog](https://ai.meta.com/blog/meta-llama-3/) for details on the tokenizer. |
|
* **Contact**: swallow[at]nlp.c.titech.ac.jp |
|
|
|
## Model Performance |
|
|
|
### Japanese tasks |
|
|
|
|Model|Size|JCom.|JEMHopQA|NIILC|JSQuAD|XL-Sum|MGSM|WMT20-en-ja|WMT20-ja-en|JMMLU|JHumanEval|Ja Avg| |
|
|---|---|---|---|---|---|---|---|---|---|---|---|---| |
|
| | |4-shot|4-shot|4-shot|4-shot|1-shot|4-shot|4-shot|4-shot|5-shot|0-shot| | |
|
| | |EM acc|Char-F1|Char-F1|Char-F1|ROUGE-2|EM acc|BLEU|BLEU|EM acc|pass@1| | |
|
|karakuri-lm-70b-chat-v0.1|70B|0.8847|0.5139|0.5668|0.9096|0.1369|0.2800|0.2526|0.2095|0.4648|0.2354|0.4454| |
|
|Meta-Llama-3-70B-Instruct|70B|0.9419|0.6114|0.5506|0.9164|0.1912|0.7200|0.2708|0.2350|0.6789|0.6610|0.5777| |
|
|Llama-3-Swallow-70B-Instruct-v0.1|70B|0.9607|0.6188|0.6026|0.9236|0.1389|0.6560|0.2724|0.2532|0.6572|0.6000|0.5683| |
|
|Qwen2-72B-Instruct|72B|0.9634|0.6268|0.5418|0.9210|0.1644|0.7840|0.2592|0.2327|0.7713|0.6909|0.5955| |
|
|
|
### English tasks |
|
|
|
|Model|Size|OpenBookQA|TriviaQA|HellaSWAG|SQuAD2.0|XWINO|MMLU|GSM8K|BBH|HumanEval|EnAvg| |
|
|---|---|---|---|---|---|---|---|---|---|---|---| |
|
|||4-shot|4-shot|4-shot|4-shot|4-shot|5-shot|4-shot|3-shot|0-shot|| |
|
|||Acc|EMacc|Acc|EMacc|Acc|Acc|EMacc|CoTEMAcc|pass@1|| |
|
|karakuri-lm-70b-chat-v0.1|70B|0.4100|0.6873|0.6315|0.3677|0.9049|0.5941|0.3882|0.5724|0.2305|0.5319| |
|
|Meta-Llama-3-70B-Instruct|70B|00.4400|0.7999|0.6552|0.4024|0.9127|0.7992|0.9052|0.8326|0.7555|0.7225| |
|
|Llama-3-Swallow-70B-Instruct-v0.1|70B|0.4520|0.8174|0.6758|0.4050|0.9230|0.7883|0.8688|0.8152|0.6890|0.7150| |
|
|Qwen2-72B-Instruct|72B|0.4360|0.7588|0.6857|0.3913|0.9110|0.8391|0.8499|0.2436|0.6939|0.6455| |
|
|
|
## MT-Bench JA |
|
|
|
|Model|Size|coding|extraction|humanities|math|reasoning|roleplay|stem|writing|JMTAvg| |
|
|---|---|---|---|---|---|---|---|---|---|---| |
|
|karakuri-lm-70b-chat-v0.1|70B|0.2804|0.5862|0.6240|0.2934|0.4183|0.5530|0.4859|0.5964|0.4797| |
|
|Meta-Llama-3-70B-Instruct|70B|0.5969|0.8410|0.7120|0.4481|0.4884|0.7117|0.6510|0.6900|0.6424| |
|
|Llama-3-Swallow-70B-Instruct-v0.1|70B|0.5269|0.7250|0.5690|0.4669|0.6121|0.6238|0.5533|0.5698|0.5809| |
|
|Qwen2-72B-Instruct|72B|0.5699|0.7858|0.8222|0.5096|0.7032|0.7963|0.7728|0.8223|0.7228| |
|
|GPT-3.5(gpt-3.5-turbo-0125)| |0.6851|0.7641|0.7414|0.5522|0.5128|0.7104|0.6266|0.7361|0.6661| |
|
|GPT-4o(gpt-4o-2024-05-13)| |0.7296|0.8540|0.8646|0.6641|0.6661|0.8274|0.8184|0.8085|0.7791| |
|
|
|
## Evaluation Benchmarks |
|
|
|
### Japanese evaluation benchmarks |
|
|
|
We used llm-jp-eval(v1.3.0), JP Language Model Evaluation Harness(commit #9b42d41) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows: |
|
|
|
- Multiple-choice question answering (JCommonsenseQA [Kurihara et al., 2022]) |
|
- Open-ended question answering (JEMHopQA [Ishii et al., 2024]) |
|
- Open-ended question answering (NIILC [関根, 2003]) |
|
- Machine reading comprehension (JSQuAD [Kurihara et al., 2022]) |
|
- Automatic summarization (XL-Sum [Hasan et al., 2021]) |
|
- Machine translation (WMT2020 ja-en [Barrault et al., 2020]) |
|
- Machine translation (WMT2020 en-ja [Barrault et al., 2020]) |
|
- Mathematical reasoning (MGSM [Shi et al., 2023]) |
|
- Academic exams (JMMLU [尹ら, 2024]) |
|
- Code generation (JHumanEval [佐藤ら, 2024]) |
|
|
|
### English evaluation benchmarks |
|
|
|
We used the Language Model Evaluation Harness(v.0.4.2) and Code Generation LM Evaluation Harness(commit #0261c52). The details are as follows: |
|
|
|
- Multiple-choice question answering (OpenBookQA [Mihaylov et al., 2018]) |
|
- Open-ended question answering (TriviaQA [Joshi et al., 2017]) |
|
- Machine reading comprehension (SQuAD2 [Rajpurkar et al., 2018]) |
|
- Commonsense reasoning (XWINO [Tikhonov and Ryabinin, 2021]) |
|
- Natural language inference (HellaSwag [Zellers et al., 2019]) |
|
- Mathematical reasoning (GSM8K [Cobbe et al., 2021]) |
|
- Reasoning (BBH (BIG-Bench-Hard) [Suzgun et al., 2023]) |
|
- Academic exams (MMLU [Hendrycks et al., 2021]) |
|
- Code generation (HumanEval [Chen et al., 2021]) |
|
|
|
### MT-Bench JA |
|
|
|
We used [Japanese MT-Bench](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question) to assess the instruction-following capabilities of models. |
|
We utilized the following settings: |
|
|
|
- Implemantation: FastChat [Zheng+, 2023] (commit #e86e70d0) |
|
- Question: [Nejumi LLM-Leaderboard NEO, mtbench_ja_question_v3](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question/v3) |
|
- Reference Answer: [Nejumi LLM-Leaderboard NEO, mtbench_ja_referenceanswer_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_referenceanswer/v1) |
|
- Prompt for Judge: [Nejumi LLM-Lederboard NEO, mtbench_ja_prompt_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_prompt/v1) |
|
- Judge: `gpt-4-1106-preview` |
|
- Scoring: Absolute scale normalized to a 0-1 range, averaged over five runs. |
|
|
|
## Usage |
|
|
|
```sh |
|
pip install vllm |
|
``` |
|
|
|
```python |
|
from transformers import AutoTokenizer |
|
from vllm import LLM, SamplingParams |
|
|
|
model_name = "tokyotech-llm/Llama-3-Swallow-70B-Instruct-v0.1" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
llm = LLM( |
|
model=model_name, |
|
tensor_parallel_size=4, |
|
) |
|
|
|
sampling_params = SamplingParams( |
|
temperature=0.6, top_p=0.9, max_tokens=512, stop="<|eot_id|>" |
|
) |
|
|
|
|
|
message = [ |
|
{"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。"}, |
|
{ |
|
"role": "user", |
|
"content": "東京の夜空に打ち上がっている花火の下、向かい合っている燕とラマの温かい物語を書いてください。", |
|
}, |
|
] |
|
prompt = tokenizer.apply_chat_template( |
|
message, tokenize=False, add_generation_prompt=True |
|
) |
|
|
|
output = llm.generate(prompt, sampling_params) |
|
|
|
print(output[0].outputs[0].text) |
|
|
|
``` |
|
|
|
## Training Datasets |
|
|
|
### Instruction Tuning |
|
|
|
The following datasets were used for the instruction tuning. |
|
|
|
- [OpenAssistant Conversations Dataset EN top-1 thread](https://huggingface.co/datasets/OpenAssistant/oasst2) |
|
- [OpenAssistant Conversations Dataset](https://huggingface.co/datasets/llm-jp/oasst1-21k-ja) was used, where human utterances are included but the responses are not used. Instead, the responses were generated using the [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) model. |
|
|
|
|
|
## Risks and Limitations |
|
|
|
The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations. |
|
|
|
## Acknowledgements |
|
|
|
We thank Meta Research for releasing Llama 3 under an open license for others to build on. |
|
|
|
Our project is supported by the [Large Generative AI Development Support Program](https://abci.ai/en/link/lfm_support_program.html) of the National Institute of Advanced Industrial Science and Technology. |
|
|
|
## License |
|
|
|
[META LLAMA 3 COMMUNITY LICENSE](https://llama.meta.com/llama3/license/) |
|
|
|
## Authors |
|
|
|
Here are the team members: |
|
- From [Tokyo Institute of Technology Okazaki Laboratory](https://www.nlp.c.titech.ac.jp/index.en.html), the following members: |
|
- [Naoaki Okazaki](https://www.chokkan.org/index.ja.html) |
|
- [Sakae Mizuki](https://s-mizuki-nlp.github.io/) |
|
- [Youmi Ma](https://www.nlp.c.titech.ac.jp/member/youmi.en.html) |
|
- [Koki Maeda](https://sites.google.com/view/silviase) |
|
- [Kakeru Hattori](https://aya-se.vercel.app/) |
|
- [Masanari Ohi](https://sites.google.com/view/masanariohi) |
|
- [Taihei Shiotani](https://github.com/inatoihs) |
|
- [Koshiro Saito](https://sites.google.com/view/koshiro-saito) |
|
- From [Tokyo Institute of Technology YOKOTA Laboratory](https://www.rio.gsic.titech.ac.jp/en/index.html), the following members: |
|
- [Rio Yokota](https://twitter.com/rioyokota) |
|
- [Kazuki Fujii](https://twitter.com/okoge_kaz) |
|
- [Taishi Nakamura](https://twitter.com/Setuna7777_2) |
|
- [Takumi Okamoto](https://www.linkedin.com/in/takumi-okamoto) |
|
- [Ishida Shigeki](https://www.wantedly.com/id/reborn27) |
|
- From [Artificial Intelligence Research Center, AIST, Japan](https://www.airc.aist.go.jp/en/teams/), the following members: |
|
- [Hiroya Takamura](https://sites.google.com/view/hjtakamura) |
|
|
|
## How to Cite |
|
|
|
If you find our work helpful, please feel free to cite us. |
|
|
|
```tex |
|
@misc{llama3swallow, |
|
title={Llama 3 Swallow}, |
|
url={https://swallow-llm.github.io/llama3-swallow.en.html}, |
|
author={Swallow LLM}, |
|
year={2024}, |
|
} |
|
``` |
|
|
|
### Citations |
|
|
|
```tex |
|
@article{llama3modelcard, |
|
title={Llama 3 Model Card}, |
|
author={AI@Meta}, |
|
year={2024}, |
|
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md} |
|
} |
|
``` |