Fix active number of params in model card

#13
by lewtun HF staff - opened
Files changed (1) hide show
  1. README.md +8 -11
README.md CHANGED
@@ -18,9 +18,9 @@ inference:
18
  <img src="https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1/resolve/main/logo.png" alt="Zephyr 141B Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
19
 
20
 
21
- # Model Card for Zephyr 141B-A35B
22
 
23
- Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr 141B-A35B is the latest model in the series, and is a fine-tuned version of [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1) that was trained using a novel alignment algorithm called [Odds Ratio Preference Optimization (ORPO)](https://huggingface.co/papers/2403.07691) with **7k instances** for **1.3 hours** on 4 nodes of 8 x H100s. ORPO does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO. To train Zephyr-141B-A35B, we used the [`argilla/distilabel-capybara-dpo-7k-binarized`](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized) preference dataset, which consists of synthetic, high-quality, multi-turn preferences that have been scored via LLMs.
24
 
25
  > [!NOTE]
26
  > This model was trained collaboratively between Argilla, KAIST, and Hugging Face
@@ -31,7 +31,7 @@ Zephyr is a series of language models that are trained to act as helpful assista
31
 
32
  <!-- Provide a longer summary of what this model is. -->
33
 
34
- - **Model type:** A Mixture of Experts (MoE) model with 141B total parameters and 35B active parameters. Fine-tuned on a mix of publicly available, synthetic datasets.
35
  - **Language(s) (NLP):** Primarily English.
36
  - **License:** Apache 2.0
37
  - **Finetuned from model:** [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)
@@ -45,11 +45,11 @@ Zephyr is a series of language models that are trained to act as helpful assista
45
 
46
  ## Performance
47
 
48
- Zephyr 141B-A35B was trained to test the effectiveness of ORPO at scale and the underlying dataset contains a mix of general chat capabilities. It achieves strong performance on chat benchmarks like [MT Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [IFEval](https://arxiv.org/abs/2311.07911). The scores reported below were obtained using the [LightEval](https://github.com/huggingface/lighteval) evaluation suite and each prompt has been formatted with the model's corresponding chat template to simulate real-world usage. This is why some scores may differ from those reported in technical reports or on the Open LLM Leaderboard.
49
 
50
  | Model | MT Bench | IFEval | BBH | AGIEval |
51
  |-----------------------------------------------------------------------------------------------------|---------:|-------:|------:|--------:|
52
- | [zephyr-orpo-141b-A35b-v0.1](https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1) | 8.17 | 65.06 | 58.96 | 44.16 |
53
  | [databricks/dbrx-instruct](https://huggingface.co/databricks/dbrx-instruct) | 8.26 | 52.13 | 48.50 | 41.16 |
54
  | [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) | 8.30 | 55.08 | 45.31 | 47.68 |
55
 
@@ -93,7 +93,7 @@ print(outputs[0]["generated_text"][-1]["content"])
93
 
94
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
95
 
96
- Zephyr 141B-A35B has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
97
  It is also unknown what the size and composition of the corpus was used to train the base model (`mistral-community/Mixtral-8x22B-v0.1`), however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.
98
 
99
 
@@ -115,9 +115,6 @@ The following hyperparameters were used during training:
115
  - lr_scheduler_warmup_steps: 100
116
  - num_epochs: 3
117
 
118
- ### Training results
119
-
120
-
121
 
122
  ### Framework versions
123
 
@@ -128,7 +125,7 @@ The following hyperparameters were used during training:
128
 
129
  ## Citation
130
 
131
- If you find Zephyr 141B-A35B is useful in your work, please cite the ORPO paper:
132
 
133
  ```
134
  @misc{hong2024orpo,
@@ -146,7 +143,7 @@ You may also wish to cite the creators of this model:
146
  ```
147
  @misc{zephyr_141b,
148
  author = {Alvaro Bartolome and Jiwoo Hong and Noah Lee and Kashif Rasul and Lewis Tunstall},
149
- title = {Zephyr 141B A35B},
150
  year = {2024},
151
  publisher = {Hugging Face},
152
  journal = {Hugging Face repository},
 
18
  <img src="https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1/resolve/main/logo.png" alt="Zephyr 141B Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
19
 
20
 
21
+ # Model Card for Zephyr 141B-A39B
22
 
23
+ Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr 141B-A39B is the latest model in the series, and is a fine-tuned version of [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1) that was trained using a novel alignment algorithm called [Odds Ratio Preference Optimization (ORPO)](https://huggingface.co/papers/2403.07691) with **7k instances** for **1.3 hours** on 4 nodes of 8 x H100s. ORPO does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO. To train Zephyr-141B-A39B, we used the [`argilla/distilabel-capybara-dpo-7k-binarized`](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized) preference dataset, which consists of synthetic, high-quality, multi-turn preferences that have been scored via LLMs.
24
 
25
  > [!NOTE]
26
  > This model was trained collaboratively between Argilla, KAIST, and Hugging Face
 
31
 
32
  <!-- Provide a longer summary of what this model is. -->
33
 
34
+ - **Model type:** A Mixture of Experts (MoE) model with 141B total parameters and 39B active parameters. (We initially made a small error in calculating the number of active parameters for the model ID. The model card states the correct number.) Fine-tuned on a mix of publicly available, synthetic datasets.
35
  - **Language(s) (NLP):** Primarily English.
36
  - **License:** Apache 2.0
37
  - **Finetuned from model:** [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1)
 
45
 
46
  ## Performance
47
 
48
+ Zephyr 141B-A39B was trained to test the effectiveness of ORPO at scale and the underlying dataset contains a mix of general chat capabilities. It achieves strong performance on chat benchmarks like [MT Bench](https://huggingface.co/spaces/lmsys/mt-bench) and [IFEval](https://arxiv.org/abs/2311.07911). The scores reported below were obtained using the [LightEval](https://github.com/huggingface/lighteval) evaluation suite and each prompt has been formatted with the model's corresponding chat template to simulate real-world usage. This is why some scores may differ from those reported in technical reports or on the Open LLM Leaderboard.
49
 
50
  | Model | MT Bench | IFEval | BBH | AGIEval |
51
  |-----------------------------------------------------------------------------------------------------|---------:|-------:|------:|--------:|
52
+ | [zephyr-orpo-141b-A39b-v0.1](https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1) | 8.17 | 65.06 | 58.96 | 44.16 |
53
  | [databricks/dbrx-instruct](https://huggingface.co/databricks/dbrx-instruct) | 8.26 | 52.13 | 48.50 | 41.16 |
54
  | [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) | 8.30 | 55.08 | 45.31 | 47.68 |
55
 
 
93
 
94
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
95
 
96
+ Zephyr 141B-A39B has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
97
  It is also unknown what the size and composition of the corpus was used to train the base model (`mistral-community/Mixtral-8x22B-v0.1`), however it is likely to have included a mix of Web data and technical sources like books and code. See the [Falcon 180B model card](https://huggingface.co/tiiuae/falcon-180B#training-data) for an example of this.
98
 
99
 
 
115
  - lr_scheduler_warmup_steps: 100
116
  - num_epochs: 3
117
 
 
 
 
118
 
119
  ### Framework versions
120
 
 
125
 
126
  ## Citation
127
 
128
+ If you find Zephyr 141B-A39B is useful in your work, please cite the ORPO paper:
129
 
130
  ```
131
  @misc{hong2024orpo,
 
143
  ```
144
  @misc{zephyr_141b,
145
  author = {Alvaro Bartolome and Jiwoo Hong and Noah Lee and Kashif Rasul and Lewis Tunstall},
146
+ title = {Zephyr 141B A39B},
147
  year = {2024},
148
  publisher = {Hugging Face},
149
  journal = {Hugging Face repository},