Update README.md
Browse files
README.md
CHANGED
@@ -6,6 +6,8 @@ library_name: transformers
|
|
6 |
tags:
|
7 |
- orpo
|
8 |
- llama 3
|
|
|
|
|
9 |
datasets:
|
10 |
- mlabonne/orpo-dpo-mix-40k
|
11 |
---
|
@@ -14,12 +16,14 @@ datasets:
|
|
14 |
|
15 |
![](https://i.imgur.com/ZHwzQvI.png)
|
16 |
|
17 |
-
This is
|
18 |
-
|
19 |
-
It's not very good at the moment (it's the sassiest model ever), but I'm currently training a version on the entire dataset.
|
20 |
|
21 |
**Try the demo**: https://huggingface.co/spaces/mlabonne/OrpoLlama-3-8B
|
22 |
|
|
|
|
|
|
|
|
|
23 |
## π Evaluation
|
24 |
|
25 |
### Nous
|
@@ -28,15 +32,22 @@ Evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoev
|
|
28 |
|
29 |
| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
|
30 |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------: | --------: | --------: | ---------: | --------: |
|
31 |
-
| [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) [π](https://gist.github.com/mlabonne/88b21dd9698ffed75d6163ebdc2f6cc8) | 52.42 | 42.75 | 72.99 | 52.99 | 40.94 |
|
32 |
| [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) [π](https://gist.github.com/mlabonne/8329284d86035e6019edb11eb0933628) | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 |
|
33 |
-
| [
|
34 |
-
| [
|
35 |
| [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) [π](https://gist.github.com/mlabonne/616b6245137a9cfc4ea80e4c6e55d847) | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |
|
36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
## π Training curves
|
38 |
|
39 |
-
|
|
|
|
|
40 |
|
41 |
## π» Usage
|
42 |
|
|
|
6 |
tags:
|
7 |
- orpo
|
8 |
- llama 3
|
9 |
+
- rlhf
|
10 |
+
- sft
|
11 |
datasets:
|
12 |
- mlabonne/orpo-dpo-mix-40k
|
13 |
---
|
|
|
16 |
|
17 |
![](https://i.imgur.com/ZHwzQvI.png)
|
18 |
|
19 |
+
This is an ORPO fine-tune of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on 1k samples of [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) created for [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3).
|
|
|
|
|
20 |
|
21 |
**Try the demo**: https://huggingface.co/spaces/mlabonne/OrpoLlama-3-8B
|
22 |
|
23 |
+
## π Application
|
24 |
+
|
25 |
+
This model uses a context window of 8k. It was trained with the ChatML template.
|
26 |
+
|
27 |
## π Evaluation
|
28 |
|
29 |
### Nous
|
|
|
32 |
|
33 |
| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
|
34 |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------: | --------: | --------: | ---------: | --------: |
|
|
|
35 |
| [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) [π](https://gist.github.com/mlabonne/8329284d86035e6019edb11eb0933628) | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 |
|
36 |
+
| [**mlabonne/OrpoLlama-3-8B**](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [π](https://gist.github.com/mlabonne/22896a1ae164859931cc8f4858c97f6f) | **48.63** | **34.17** | **70.59** | **52.39** | **37.36** |
|
37 |
+
| [mlabonne/OrpoLlama-3-8B-1k](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [π](https://gist.github.com/mlabonne/f41dad371d1781d0434a4672fd6f0b82) | 46.76 | 31.56 | 70.19 | 48.11 | 37.17 |
|
38 |
| [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) [π](https://gist.github.com/mlabonne/616b6245137a9cfc4ea80e4c6e55d847) | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |
|
39 |
|
40 |
+
`mlabonne/OrpoLlama-3-8B-1k` corresponds to a version of this model trained on 1K samples (you can see the parameters in [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3)).
|
41 |
+
|
42 |
+
### Open LLM Leaderboard
|
43 |
+
|
44 |
+
TBD.
|
45 |
+
|
46 |
## π Training curves
|
47 |
|
48 |
+
You can find the experiment on W&B at [this address](https://wandb.ai/mlabonne/DPO/runs/vxnmq24z/workspace?nw=nwusermlabonne).
|
49 |
+
|
50 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/zm71HyZiG96YY1GUtpfHq.png)
|
51 |
|
52 |
## π» Usage
|
53 |
|