werty1248 commited on
Commit
89a67dc
1 Parent(s): a1ff61a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -13,11 +13,11 @@ language:
13
  - zh
14
  license: apache-2.0
15
  ---
16
- # Mistral-Nemo-NT-Ko-12B-dpo-test
17
 
18
  ## Description
19
 
20
- **Mistral-Nemo-NT-Ko-12B-dpo-test** is a shallowly DPO-trained version of [*werty1248/Mistral-Nemo-NT-Ko-12B-sft*](https://huggingface.co/werty1248/Mistral-Nemo-NT-Ko-12B-sft).
21
 
22
  According to the [Hermes 3 Tech Report](https://nousresearch.com/wp-content/uploads/2024/08/Hermes-3-Technical-Report.pdf), DPO made negligible performance improvements in their model. Therefore, I followed the same approach described in the report and applied DPO using LoRA.
23
  - LoRA r = 32
@@ -50,20 +50,20 @@ From each dataset, I sampled a subset based on the score given by the reward mod
50
  | 모델 | 방법 | 추론 | 수학 | 글쓰기 | 코딩 | 이해 | 문법 | 싱글턴 | 멀티턴 | 총점 |
51
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
52
  |Mistral-Nemo-NT-Ko-12B-sft| cot-1-shot |7.36 | 6.57 | 8.71 | 8.57 | 9.57 | 6.43 | 7.81 | 7.93 | **7.87** |
53
- |**Mistral-Nemo-NT-Ko-12B-dpo-test**| cot-1-shot | 6.79 | 6.43 | 9.43 | 9.79 | 9.43 | 5.29 | 7.71 | 8.00 | **7.86** |
54
  | Mistral Nemo | cot-1-shot | 5.43 | 6.86 | 6.07 | 7.57 | 5.86 | 7.57 | 7.50 | 5.62 |6.56|
55
 
56
  *1-shot*
57
  | 모델 | 방법 | 추론 | 수학 | 글쓰기 | 코딩 | 이해 | 문법 | 싱글턴 | 멀티턴 | 총점 |
58
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
59
- |**Mistral-Nemo-NT-Ko-12B-dpo-test**| 1-shot | 8.14 | 5.50 | 9.36 | 8.57 | 9.50 | 4.71 | 7.38 | 7.88 | **7.63** |
60
  |Mistral-Nemo-NT-Ko-12B-sft| 1-shot | 9.00 | 5.71 | 7.93 | 8.29 | 7.93 | 5.21 | 7.29 | 7.40 | 7.35 |
61
  | Mistral Nemo | 1-shot | 5.00 | 6.50 | 6.86 | 8.07 | 7.64 | 8.43 | 7.60 | 6.57 |7.08|
62
 
63
  *Default*
64
  | 모델 | 방법 | 추론 | 수학 | 글쓰기 | 코딩 | 이해 | 문법 | 싱글턴 | 멀티턴 | 총점 |
65
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
66
- |**Mistral-Nemo-NT-Ko-12B-dpo-test**| default | 6.21 | 5.79 | 8.00 | 8.36 | 9.43 | 5.43 | 7.17 | 7.24 | **7.20** |
67
  |Mistral-Nemo-NT-Ko-12B-sft| default | 6.00 | 4.93 | 5.43 | 7.14 | 9.71 | 4.00 | 6.45 | 5.95 | 6.20 |
68
  | Mistral Nemo | default | 0.43 | 7.64 | 6.21 | 7.14 | 6.79 | 7.21 | 6.26 | 5.55 |5.90|
69
 
@@ -71,16 +71,16 @@ From each dataset, I sampled a subset based on the score given by the reward mod
71
 
72
  | Model | Language | Monolingual-LPR | Monolingual-WPR | Crosslingual-LPR | Crosslingual-WPR |
73
  | --- | --- | --- | --- | --- | --- |
74
- |Mistral-Nemo-NT-Ko-12B-dpo-test| ko | 100.00% | 97.96% | **85.63%** | 96.93% |
75
  |Mistral-Nemo-NT-Ko-12B-sft| ko | 100.00% | 99.00% | **87.51%** | 96.96% |
76
  |Mistral-Nemo-Instruct-2407 | ko | 90.72% | 93.18% | 46.75% | 92.84% |
77
  |Meta-Llama-3.1-8B-Instruct | ko | 99.00% | 96.97% | 91.45% | 93.01% |
78
  |gemma-2-9b-it | ko | 100.00% | 98.00% | 87.93% | 95.58% |
79
  | --- | --- | --- | --- | --- | --- |
80
- |Mistral-Nemo-NT-Ko-12B-dpo-test| zh | 99.00% | 99.50% | **80.52%** | 97.51% |
81
  |Mistral-Nemo-Instruct-2407 | zh | 97.50% | 98.98% | 53.43% | 93.58% |
82
  | --- | --- | --- | --- | --- | --- |
83
- |Mistral-Nemo-NT-Ko-12B-dpo-test| ja | 100.00% | 100.00% | **86.89%** | 95.41% |
84
  |Mistral-Nemo-Instruct-2407 | ja | 94.00% | 98.94% | 50.27% | 96.05% |
85
 
86
  ## Template
@@ -185,6 +185,6 @@ special_tokens:
185
  </details><br>
186
 
187
 
188
- - Training loss
189
 
190
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6629154d55d7c289634b8c5d/5m2K7azV5ZhGGZqWJZNWX.png)
 
13
  - zh
14
  license: apache-2.0
15
  ---
16
+ # Mistral-Nemo-NT-Ko-12B-dpo
17
 
18
  ## Description
19
 
20
+ **Mistral-Nemo-NT-Ko-12B-dpo** is a shallowly DPO-trained version of [*werty1248/Mistral-Nemo-NT-Ko-12B-sft*](https://huggingface.co/werty1248/Mistral-Nemo-NT-Ko-12B-sft).
21
 
22
  According to the [Hermes 3 Tech Report](https://nousresearch.com/wp-content/uploads/2024/08/Hermes-3-Technical-Report.pdf), DPO made negligible performance improvements in their model. Therefore, I followed the same approach described in the report and applied DPO using LoRA.
23
  - LoRA r = 32
 
50
  | 모델 | 방법 | 추론 | 수학 | 글쓰기 | 코딩 | 이해 | 문법 | 싱글턴 | 멀티턴 | 총점 |
51
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
52
  |Mistral-Nemo-NT-Ko-12B-sft| cot-1-shot |7.36 | 6.57 | 8.71 | 8.57 | 9.57 | 6.43 | 7.81 | 7.93 | **7.87** |
53
+ |**Mistral-Nemo-NT-Ko-12B-dpo**| cot-1-shot | 6.79 | 6.43 | 9.43 | 9.79 | 9.43 | 5.29 | 7.71 | 8.00 | **7.86** |
54
  | Mistral Nemo | cot-1-shot | 5.43 | 6.86 | 6.07 | 7.57 | 5.86 | 7.57 | 7.50 | 5.62 |6.56|
55
 
56
  *1-shot*
57
  | 모델 | 방법 | 추론 | 수학 | 글쓰기 | 코딩 | 이해 | 문법 | 싱글턴 | 멀티턴 | 총점 |
58
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
59
+ |**Mistral-Nemo-NT-Ko-12B-dpo**| 1-shot | 8.14 | 5.50 | 9.36 | 8.57 | 9.50 | 4.71 | 7.38 | 7.88 | **7.63** |
60
  |Mistral-Nemo-NT-Ko-12B-sft| 1-shot | 9.00 | 5.71 | 7.93 | 8.29 | 7.93 | 5.21 | 7.29 | 7.40 | 7.35 |
61
  | Mistral Nemo | 1-shot | 5.00 | 6.50 | 6.86 | 8.07 | 7.64 | 8.43 | 7.60 | 6.57 |7.08|
62
 
63
  *Default*
64
  | 모델 | 방법 | 추론 | 수학 | 글쓰기 | 코딩 | 이해 | 문법 | 싱글턴 | 멀티턴 | 총점 |
65
  | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
66
+ |**Mistral-Nemo-NT-Ko-12B-dpo**| default | 6.21 | 5.79 | 8.00 | 8.36 | 9.43 | 5.43 | 7.17 | 7.24 | **7.20** |
67
  |Mistral-Nemo-NT-Ko-12B-sft| default | 6.00 | 4.93 | 5.43 | 7.14 | 9.71 | 4.00 | 6.45 | 5.95 | 6.20 |
68
  | Mistral Nemo | default | 0.43 | 7.64 | 6.21 | 7.14 | 6.79 | 7.21 | 6.26 | 5.55 |5.90|
69
 
 
71
 
72
  | Model | Language | Monolingual-LPR | Monolingual-WPR | Crosslingual-LPR | Crosslingual-WPR |
73
  | --- | --- | --- | --- | --- | --- |
74
+ |Mistral-Nemo-NT-Ko-12B-dpo| ko | 100.00% | 97.96% | **85.63%** | 96.93% |
75
  |Mistral-Nemo-NT-Ko-12B-sft| ko | 100.00% | 99.00% | **87.51%** | 96.96% |
76
  |Mistral-Nemo-Instruct-2407 | ko | 90.72% | 93.18% | 46.75% | 92.84% |
77
  |Meta-Llama-3.1-8B-Instruct | ko | 99.00% | 96.97% | 91.45% | 93.01% |
78
  |gemma-2-9b-it | ko | 100.00% | 98.00% | 87.93% | 95.58% |
79
  | --- | --- | --- | --- | --- | --- |
80
+ |Mistral-Nemo-NT-Ko-12B-dpo| zh | 99.00% | 99.50% | **80.52%** | 97.51% |
81
  |Mistral-Nemo-Instruct-2407 | zh | 97.50% | 98.98% | 53.43% | 93.58% |
82
  | --- | --- | --- | --- | --- | --- |
83
+ |Mistral-Nemo-NT-Ko-12B-dpo| ja | 100.00% | 100.00% | **86.89%** | 95.41% |
84
  |Mistral-Nemo-Instruct-2407 | ja | 94.00% | 98.94% | 50.27% | 96.05% |
85
 
86
  ## Template
 
185
  </details><br>
186
 
187
 
188
+ - reward margin
189
 
190
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6629154d55d7c289634b8c5d/5m2K7azV5ZhGGZqWJZNWX.png)