---
extra_gated_prompt: >-
    ### 「LLM-jp-3 172B beta1」利用規約

    この利用規約（以下「本規約」といいます）は、大学共同利用機関法人 情報・システム研究機構 国立情報学研究所（以下「開発者」といいます）による開発の成果物として公開する大規模言語モデル「LLM-jp-3 172B beta1」の利用に関する条件を定めるものです。「LLM-jp-3 172B beta1」の利用者（以下「利用者」といいます）は、本規約に同意した上で「LLM-jp-3 172B beta1」を利用するものとします。

    - 第１条（利用許諾）
    利用者は、本規約に従い、「LLM-jp-3 172B beta1」を非商用目的でのみ利用することができます。なお、利用の範囲には、「LLM-jp-3 172B beta1」の改変および複製を含むものとします。本規約に違反した利用者は、「LLM-jp-3 172B beta1」を利用することはできません。

    - 第２条（責任）
    １．利用者は、「LLM-jp-3 172B beta1」は現状有姿で提供され、開発者は、明示または黙示を問わず、「LLM-jp-3 172B beta1」に関し、その正確性、完全性、最新性、および品質など、いかなる保証も行わず、利用者が本「LLM-jp-3 172B beta1」を利用したこと、利用できなかったことにより生じた一切の損害について責任を負わないことを、予め承諾するものとします。
    ２．利用者は、利用者による「LLM-jp-3 172B beta1」の利用により、または、利用者が本利用規約に違反したことにより開発者が損害を被った場合、当該損害を賠償するものとします。
    ３．利用者は、自己の責任と判断において利用するものとし、「LLM-jp-3 172B beta1」の利用に関して、第三者との間で生じた紛争について、自らの責任と負担で対応し、開発者に一切の迷惑を掛けないものとします。利用者は「LLM-jp-3 172B beta1」の利用によって生じた損害について自己の責任で対処するものとします。

    - 第３条（禁止行為） 利用者は「LLM-jp-3 172B beta1」を利用して以下の行為を行わないものとします。
    (1)	開発者もしくは第三者の知的財産権を侵害する行為、または侵害するおそれのある行為
    (2)	開発者もしくは第三者の財産、プライバシーもしくは肖像権を侵害する行為、または侵害するおそれのある行為
    (3)	開発者もしくは第三者を差別もしくは誹謗中傷・侮辱し、他者への差別を助長し、または名誉もしくは信用を毀損する行為
    (4)	開発者もしくは第三者への迷惑行為、または迷惑になる恐れのある行為
    (5)	許可されていない法律業務に従事したり、有資格の専門家以外からの法律アドバイスを提供したりする行為
    (6)	有資格の専門家以外からの財務アドバイスを提供する行為
    (7)	健康への助言や治療方法の提示などを含む医療行為
    (8)	その他法令に基づく許可等が必要な行為

    - 第４条（制約事項）
    １．利用者は、「LLM-jp-3 172B beta1」を用いた処理の結果物（以下「処理結果」という）には、虚偽や偏り、他人の権利を侵害する内容、または利用者の想定する有効性や有用性を満たさない内容が含まれている場合があることを承諾し、不正確・不適切な処理結果により、自ら又は第三者の損害や権利侵害の発生、倫理的懸念が起こり得るという前提に立ち「LLM-jp-3 172B beta1」を利用するものとします。利用者は、処理結果の正誤や適法性、倫理的妥当性を自ら確認の上、利用するものとします。利用者が処理結果を含め「LLM-jp-3 172B beta1」を用いたことにより、利用者自身又は第三者の権利侵害を発生させた場合、開発者はその損害に対して一切の責任を負わないものとし、利用者は開発者に対し一切の迷惑を掛けないものとします。
    ２．利用者は処理結果について、それぞれの国や地域において法令などの規制を順守した上で利用するものとします。
    ３．利用者は、処理結果を第３条（禁止事項）に記載の行為に利用しないものとします。

    - 第５条（権利帰属等）
    １．利用者は、本利用規約で明示で定めるものを除き「LLM-jp-3 172B beta1」に関する一切の権利を取得することはありません。
    ２．利用者は、「LLM-jp-3 172B beta1」改変物の作成によって新たに発生した権利を取得しますが、改変物の利用に当たっては本利用規約に従って利用するものとします。
    ３．開発者は処理結果について、権利主張を行わないものとします。

    - 第６条（輸出取引）
    利用者は、「LLM-jp-3 172B beta1」および処理結果の利用に関連して外国為替及び外国貿易法（これに関連する政省令を含む）または米国輸出管理法令で規定する許可が必要な輸出を行うときは、利用者自らが所定の許可を取得するものとします。

    - 第７条（管轄裁判所）
    本利用規約に関し生じた紛争については、東京地方裁判所をもって第一審の専属的合意管轄裁判所とします。

    - 第８条（準拠法）
    本利用規約は日本法に準拠します。

    - 第９条（その他の規定）
    本規約は、「LLM-jp-3 172B beta1」の利用者と開発者との間の利用に関する全ての事項を定めるものであり、本規約に定めのない事項については、関係法令に従うものとします。

    - 第１０条（言語）
    本規約は日本語を正本とします。本規約の英訳版は、参考のために作成されたものであり、何らの法的拘束力もないものとします。

    以上

    ### LLM-jp-3 172B beta1 Terms of Use

    This Terms of Use (hereinafter referred to as "TOU") sets forth the conditions for the use of the large-scale language model (hereinafter referred to as "LLM-jp-3 172B beta1") that is made public as a result of the development by the Research and Development Center for Large Language Models at the National Institute of Informatics (hereinafter referred to as "the Developer"). Users of LLM-jp-3 172B beta1 (hereinafter referred to as "Users") shall use LLM-jp-3 172B beta1 upon agreeing to the TOU.

    - Article 1 (License to Use)

    Users of LLM-jp-3 172B beta1 may use LLM-jp-3 172B beta1 for non-commercial purposes in accordance with the TOU. The word “use” includes the modification and duplication of LLM-jp-3 172B beta1. Users who violate the TOU are not allowed to use LLM-jp-3 172B beta1.

    - Article 2 (Responsibility)
        1. Users agree in advance that LLM-jp-3 172B beta1 is provided “AS IS”, and the Developer makes no warranties, express or implied, regarding LLM-jp-3 172B beta1, including, but not limited to, its accuracy, completeness, up-to-dateness, and quality, and that Developer shall not be liable for any damages arising from the use or inability to use LLM-jp-3 172B beta1.
        2. Users shall compensate for any and all damages suffered by the Developer as a result of the use of LLM-jp-3 172B beta1 and/or the Users' violation of the TOU.
        3. Users shall use LLM-jp-3 172B beta1 at their own responsibility and discretion, and shall handle any disputes arising with third parties in relation to the use of LLM-jp-3 172B beta1 at their own responsibility and expense, and shall indemnify, defend and hold harmless the Developer against all damages and losses without causing any inconvenience to the Developer. Users shall deal with any damages caused by the use of LLM-jp-3 172B beta1 at their own responsibility.

    - Article 3 (Prohibited Actions)
    
    Users shall not engage in the following actions when using LLM-jp-3 172B beta1.
        1. Actions that will or may infringe on the intellectual property rights of The Developer or third parties;
        2. Actions that will or may infringe on the property, privacy, or portrait rights of the Developer or third parties; 
        3. Actions that discriminate against, defame, insult, or slander the Developer or third parties, promote discrimination against others, or damage the reputation or credibility of others;
        4. Actions that will or may cause inconvenience or harm to the Developer or third parties;
        5. Actions that engage in unauthorized legal services and/or provide legal advice from anyone other than a qualified professional;
        6. Actions that provide financial advice from anyone other than a qualified professional;
        7. Medical actions, including providing health advice or suggesting treatment methods; and
        8. Other actions that require permissions or other forms of authorization under laws and regulations.

    - Article 4 (Restrictions)
        1. Users acknowledge that the results of processing using LLM-jp-3 172B beta1 (hereinafter referred to as "Processing Results") may contain falsehoods, biases, content that infringes on the rights of others, or content that does not meet the effectiveness or usefulness expected by Users, and agree to use LLM-jp-3 172B beta1 on the premise that inaccurate or inappropriate Processing Results may cause damage or infringement of rights to Users or third parties and/or ethical concerns. Users shall use the Processing Results after confirming their accuracy, legality, and ethical validity themselves. If the use of LLM-jp-3 172B beta1, including the Processing Results, by Users cause infringement of the rights of the Users themselves or third parties, the Developer shall not be responsible for any damages, and the Users shall indemnify, defend and hold harmless the Developer against all damages and losses without causing any inconvenience to the Developer.
        2. Users shall use the Processing Results in compliance with the regulations such as laws and regulations in each country and region.
        3. Users shall not use the Processing Results for the actions listed in Article 3 (Prohibited Actions).

    Article 5 (Ownership of Rights)
        1. Except as expressly provided in the TOU, Users shall not acquire any rights in relation to LLM-jp-3 172B beta1.
        2. Users will acquire rights newly arising from the creation of Modified Works of LLM-jp-3 172B beta1, but Users shall use Modified Works in accordance with the TOU.
        3. The Developer shall not assert any rights to the Processing Results. 

    Article 6 (Export Transaction)

    Users shall obtain the necessary permissions themselves when exporting LLM-jp-3 172B beta1 and the Processing Results in relation to their use, where such export requires permissions under the Foreign Exchange and Foreign Trade Act (including related cabinet order and ministerial order) or U.S. export control laws and regulations.

    Article 7 (Jurisdiction)
    
    The Tokyo District Court shall have exclusive jurisdiction in the court of the first instance over any disputes arising out of or in connection with the TOU.

    Article 8 (Governing Law)
    
    The TOU is governed by and construed in accordance with the laws of Japan.

    Article 9 (Other Provisions)
    
    The TOU sets forth the entire agreement as to all matters concerning the use of LLM-jp-3 172B beta1 between the Users and the Developer, and matters not provided for in the TOU shall be governed by the relevant laws and regulations. 

    Article 10 (Governing Language)
    The governing language of the TOU shall be Japanese. The English translation hereof is made for reference purpose only and shall have no effect.


extra_gated_fields:
    Affiliation: text
    I want to use this model for: text
    I agree to use this model for non-commercial use ONLY: checkbox

license: other
license_name: llm-jp-3-172b-beta1-tou
license_link: LICENSE
language:
  - en
  - ja
programming_language:
  - C
  - C++
  - C#
  - Go
  - Java
  - JavaScript
  - Lua
  - PHP
  - Python
  - Ruby
  - Rust
  - Scala
  - TypeScript
library_name: transformers
pipeline_tag: text-generation
inference: false
---
# llm-jp-3-172b-beta1

This repository provides large language models developed by the [Research and Development Center for Large Language Models](https://llmc.nii.ac.jp/) at the [National Institute of Informatics](https://www.nii.ac.jp/en/).

| Model Variant | 
| :--- |
| [llm-jp-3-172b-beta1](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1) |
| [llm-jp-3-172b-beta1-instruct](https://huggingface.co/llm-jp/llm-jp-3-172b-beta1-instruct) |


Checkpoints format: Hugging Face Transformers


## Required Libraries and Their Versions

- torch>=2.3.0
- transformers>=4.40.1
- tokenizers>=0.19.1
- accelerate>=0.29.3
- flash-attn>=2.5.8

## Usage

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("llm-jp/llm-jp-3-172b-beta1")
model = AutoModelForCausalLM.from_pretrained("llm-jp/llm-jp-3-172b-beta1", device_map="auto", torch_dtype=torch.bfloat16)
text = "自然言語処理とは何か"
tokenized_input = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(
        tokenized_input,
        max_new_tokens=100,
        do_sample=True,
        top_p=0.95,
        temperature=0.7,
        repetition_penalty=1.05,
    )[0]
print(tokenizer.decode(output))
```


## Model Details

- **Model type:** Transformer-based Language Model
- **Total seen tokens:** 700B

|Params|Layers|Hidden size|Heads|Context length|
|:---:|:---:|:---:|:---:|:---:|
|172b|96|12288|96|4096|


## Training

- **Pre-training:**
  - **Hardware:** 512 H100 GPUs
  - **Software:** Megatron-LM

- **Instruction tuning:**
  - **Hardware:** 8 H100 GPUs
  - **Software:** [NVIDIA NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner)

## Tokenizer

The tokenizer of this model is based on [huggingface/tokenizers](https://github.com/huggingface/tokenizers) Unigram byte-fallback model.
The vocabulary entries were converted from [`llm-jp-tokenizer v3.0`](https://github.com/llm-jp/llm-jp-tokenizer/releases/tag/v3.0b2).
Please refer to [README.md](https://github.com/llm-jp/llm-jp-tokenizer) of `llm-jp-tokenizer` for details on the vocabulary construction procedure (the pure SentencePiece training does not reproduce our vocabulary).

## Datasets

### Pre-training

The models have been pre-trained using a blend of the following datasets.

| Language | Dataset | Tokens|
|:---|:---|---:|
|Japanese|[Wikipedia](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)|2.6B
||[Common Crawl](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)|762.8B
||[WARP/PDF](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)|237.3B
||[WARP/HTML](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)|2.7B
||[Kaken](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)|1.8B
|English|[Wikipedia](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)|4.7B
||[Dolma/CC-head](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)|608.5B
||[Dolma/C4](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)|181.6B
||[Dolma/Reddit](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)|83.1B
||[Dolma/PeS2o](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)|62.9B
||[Dolma/Gutenberg](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)|5.5B
||[Dolma/Wiki](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3)|3.9B
|Code|[The Stack](https://huggingface.co/datasets/bigcode/the-stack)|114.1B
|Chinese|[Wikipedia](https://huggingface.co/datasets/bigcode/the-stack)|0.8B
|Korean|[Wikipedia](https://huggingface.co/datasets/bigcode/the-stack)|0.3B

### Instruction tuning

The models have been fine-tuned on the following datasets.
 
| Language | Dataset | description |
|:---|:---|:---|
|Japanese|[ichikara-instruction-004-002](https://liat-aip.sakura.ne.jp/wp/llm%e3%81%ae%e3%81%9f%e3%82%81%e3%81%ae%e6%97%a5%e6%9c%ac%e8%aa%9e%e3%82%a4%e3%83%b3%e3%82%b9%e3%83%88%e3%83%a9%e3%82%af%e3%82%b7%e3%83%a7%e3%83%b3%e3%83%87%e3%83%bc%e3%82%bf%e4%bd%9c%e6%88%90/llm%e3%81%ae%e3%81%9f%e3%82%81%e3%81%ae%e6%97%a5%e6%9c%ac%e8%aa%9e%e3%82%a4%e3%83%b3%e3%82%b9%e3%83%88%e3%83%a9%e3%82%af%e3%82%b7%e3%83%a7%e3%83%b3%e3%83%87%e3%83%bc%e3%82%bf-%e5%85%ac%e9%96%8b/)| A manually constructed Japanese instruction dataset |
|        |[answer-carefully-001](https://liat-aip.sakura.ne.jp/wp/answercarefully-dataset/)| A manually constructed Japanese instruction dataset focusing on LLMs' safety |
|        |[databricks-dolly-15k-ja](https://huggingface.co/datasets/llm-jp/databricks-dolly-15k-ja)| [databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) translated into Japanese using DeepL  |
|        |[oasst1-21k-ja](https://huggingface.co/datasets/llm-jp/oasst1-21k-ja)| A subset of [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) translated into Japanese using DeepL |
|        |[oasst2-33k-ja](https://huggingface.co/datasets/llm-jp/oasst2-33k-ja)| A subset of [oasst2](https://huggingface.co/datasets/OpenAssistant/oasst2) translated into Japanese using DeepL |
|        |aya-dataset-ja| A Japanese subset of [aya_dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) | 
|        |ichikara-instruction-format| A small amount of instruction dataset edited from ichikara-instruction, with some constraints on the output format. | 
|English |[databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) | - | 
|        |[oasst1-21k-en](https://huggingface.co/datasets/llm-jp/oasst1-21k-en)| A subset of [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) |
|        |[oasst2-33k-en](https://huggingface.co/datasets/llm-jp/oasst2-33k-en)| A subset of [oasst2](https://huggingface.co/datasets/OpenAssistant/oasst2) |
|        |[Daring-Anteater](https://huggingface.co/datasets/nvidia/Daring-Anteater)| - | 
|        |[FLAN](https://huggingface.co/datasets/Open-Orca/FLAN) | We used sampled one. | 

## Risks and Limitations

The models released here are in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.


## Send Questions to

llm-jp(at)nii.ac.jp


## License

See the [LICENSE](LICENSE) file.


## Model Card Authors

*The names are listed in alphabetical order.*

Hirokazu Kiyomaru.