Commit
•
7f1c97f
1
Parent(s):
4601beb
Update README.md (#2)
Browse files- Update README.md (bdfd9ce70dee88dba8e231ad9ee43d415139ecb3)
Co-authored-by: VILARIN <[email protected]>
README.md
CHANGED
@@ -17,7 +17,7 @@ library_name: transformers
|
|
17 |
|
18 |
# Model Summary
|
19 |
|
20 |
-
> OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in September 2024 (0924) that has been adapted via SFT and DPO from [OLMoE-1B-7B](https://hf.co/
|
21 |
|
22 |
This information and more can also be found on the [**OLMoE GitHub repository**](https://github.com/allenai/OLMoE).
|
23 |
- **Paper**: https://arxiv.org/abs/2409.02060
|
@@ -53,7 +53,7 @@ Here's how it works: imagine you have a bunch of toys, and you want to
|
|
53 |
```
|
54 |
|
55 |
Branches:
|
56 |
-
- `main`: Preference tuned via DPO model of https://hf.co/
|
57 |
- `load-balancing`: Ablation with load balancing loss during DPO starting from the `load-balancing` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT
|
58 |
- `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/allenai/OLMoE-1B-7B-0924)
|
59 |
- `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto*` branches correspond to the other checkpoints mentioned in the paper.
|
|
|
17 |
|
18 |
# Model Summary
|
19 |
|
20 |
+
> OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters released in September 2024 (0924) that has been adapted via SFT and DPO from [OLMoE-1B-7B](https://hf.co/allenai/OLMoE-1B-7B-0924). It yields state-of-the-art performance among models with a similar cost (1B) and is competitive with much larger models like Llama2-13B-Chat. OLMoE is 100% open-source.
|
21 |
|
22 |
This information and more can also be found on the [**OLMoE GitHub repository**](https://github.com/allenai/OLMoE).
|
23 |
- **Paper**: https://arxiv.org/abs/2409.02060
|
|
|
53 |
```
|
54 |
|
55 |
Branches:
|
56 |
+
- `main`: Preference tuned via DPO model of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT (`main` branch)
|
57 |
- `load-balancing`: Ablation with load balancing loss during DPO starting from the `load-balancing` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT
|
58 |
- `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/allenai/OLMoE-1B-7B-0924)
|
59 |
- `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto*` branches correspond to the other checkpoints mentioned in the paper.
|