Update README
Browse files
README.md
CHANGED
@@ -7,271 +7,21 @@ tags:
|
|
7 |
- mergekit
|
8 |
- merge
|
9 |
- Yi
|
|
|
|
|
10 |
license_name: yi-license
|
11 |
license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE
|
12 |
base_model: []
|
13 |
-
model-index:
|
14 |
-
- name: Yi-34B-200K-DARE-merge-v7
|
15 |
-
results:
|
16 |
-
- task:
|
17 |
-
type: text-generation
|
18 |
-
name: Text Generation
|
19 |
-
dataset:
|
20 |
-
name: AI2 Reasoning Challenge (25-Shot)
|
21 |
-
type: ai2_arc
|
22 |
-
config: ARC-Challenge
|
23 |
-
split: test
|
24 |
-
args:
|
25 |
-
num_few_shot: 25
|
26 |
-
metrics:
|
27 |
-
- type: acc_norm
|
28 |
-
value: 68.09
|
29 |
-
name: normalized accuracy
|
30 |
-
source:
|
31 |
-
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/Yi-34B-200K-DARE-merge-v7
|
32 |
-
name: Open LLM Leaderboard
|
33 |
-
- task:
|
34 |
-
type: text-generation
|
35 |
-
name: Text Generation
|
36 |
-
dataset:
|
37 |
-
name: HellaSwag (10-Shot)
|
38 |
-
type: hellaswag
|
39 |
-
split: validation
|
40 |
-
args:
|
41 |
-
num_few_shot: 10
|
42 |
-
metrics:
|
43 |
-
- type: acc_norm
|
44 |
-
value: 85.99
|
45 |
-
name: normalized accuracy
|
46 |
-
source:
|
47 |
-
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/Yi-34B-200K-DARE-merge-v7
|
48 |
-
name: Open LLM Leaderboard
|
49 |
-
- task:
|
50 |
-
type: text-generation
|
51 |
-
name: Text Generation
|
52 |
-
dataset:
|
53 |
-
name: MMLU (5-Shot)
|
54 |
-
type: cais/mmlu
|
55 |
-
config: all
|
56 |
-
split: test
|
57 |
-
args:
|
58 |
-
num_few_shot: 5
|
59 |
-
metrics:
|
60 |
-
- type: acc
|
61 |
-
value: 77.3
|
62 |
-
name: accuracy
|
63 |
-
source:
|
64 |
-
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/Yi-34B-200K-DARE-merge-v7
|
65 |
-
name: Open LLM Leaderboard
|
66 |
-
- task:
|
67 |
-
type: text-generation
|
68 |
-
name: Text Generation
|
69 |
-
dataset:
|
70 |
-
name: TruthfulQA (0-shot)
|
71 |
-
type: truthful_qa
|
72 |
-
config: multiple_choice
|
73 |
-
split: validation
|
74 |
-
args:
|
75 |
-
num_few_shot: 0
|
76 |
-
metrics:
|
77 |
-
- type: mc2
|
78 |
-
value: 58.9
|
79 |
-
source:
|
80 |
-
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/Yi-34B-200K-DARE-merge-v7
|
81 |
-
name: Open LLM Leaderboard
|
82 |
-
- task:
|
83 |
-
type: text-generation
|
84 |
-
name: Text Generation
|
85 |
-
dataset:
|
86 |
-
name: Winogrande (5-shot)
|
87 |
-
type: winogrande
|
88 |
-
config: winogrande_xl
|
89 |
-
split: validation
|
90 |
-
args:
|
91 |
-
num_few_shot: 5
|
92 |
-
metrics:
|
93 |
-
- type: acc
|
94 |
-
value: 83.11
|
95 |
-
name: accuracy
|
96 |
-
source:
|
97 |
-
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/Yi-34B-200K-DARE-merge-v7
|
98 |
-
name: Open LLM Leaderboard
|
99 |
-
- task:
|
100 |
-
type: text-generation
|
101 |
-
name: Text Generation
|
102 |
-
dataset:
|
103 |
-
name: GSM8k (5-shot)
|
104 |
-
type: gsm8k
|
105 |
-
config: main
|
106 |
-
split: test
|
107 |
-
args:
|
108 |
-
num_few_shot: 5
|
109 |
-
metrics:
|
110 |
-
- type: acc
|
111 |
-
value: 65.35
|
112 |
-
name: accuracy
|
113 |
-
source:
|
114 |
-
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/Yi-34B-200K-DARE-merge-v7
|
115 |
-
name: Open LLM Leaderboard
|
116 |
---
|
117 |
-
# Possibly made obsolete by: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-megamerge-v8
|
118 |
|
|
|
119 |
|
|
|
|
|
120 |
|
121 |
-
|
122 |
-
|
123 |
-
A merge of several Yi 34B 200K models using the new DARE Ties method via mergekit. The goal is to create a merge model that excels at 32K+ context performance.
|
124 |
-
|
125 |
-
## Prompt template: Orca-Vicuna
|
126 |
-
```
|
127 |
-
SYSTEM: {system_message}
|
128 |
-
USER: {prompt}
|
129 |
-
ASSISTANT:
|
130 |
-
```
|
131 |
-
It might recognize ChatML, and possibly Alpaca-like formats. Raw prompting as described here is also effective: https://old.reddit.com/r/LocalLLaMA/comments/18zqy4s/the_secret_to_writing_quality_stories_with_llms/
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
## Running
|
136 |
-
Being a Yi model, try running a lower temperature with 0.02-0.06 MinP, a little repetition penalty, maybe mirostat with a low tau, and no other samplers. Yi tends to run "hot" by default, and it really needs a low temperature + MinP to cull the huge vocabulary.
|
137 |
-
|
138 |
-
24GB GPUs can efficiently run Yi-34B-200K models at **45K-90K context** with exllamav2, and performant UIs like [exui](https://github.com/turboderp/exui). I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/). 16GB GPUs can still run the high context with aggressive quantization.
|
139 |
-
|
140 |
-
To load/train this in full-context backends like transformers, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM! I do not recommend running high context without context-efficient backends like exllamav2 or unsloth.
|
141 |
-
|
142 |
-
|
143 |
-
## Testing Notes
|
144 |
-
|
145 |
-
See: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-merge-v5#testing-notes
|
146 |
-
|
147 |
-
A "4k" merge model was created to try and extend the context of SUS Chat and DPO-bagel before adding them to the merge: https://huggingface.co/brucethemoose/SUS-Bagel-200K-DARE-Test
|
148 |
-
|
149 |
-
In addition, the weight gradients are biased towards Vicuna-format models in the first few layers to try and "emphasize" the Orca-Vicuna prompt template. How sucessful this is remains to be seen.
|
150 |
-
|
151 |
-
|
152 |
-
### Merge Method
|
153 |
-
|
154 |
-
This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama as a base.
|
155 |
-
|
156 |
-
### Models Merged
|
157 |
-
|
158 |
-
The following models were included in the merge:
|
159 |
-
* https://huggingface.co/kyujinpy/PlatYi-34B-200k-Q-FastChat
|
160 |
-
* https://huggingface.co/jondurbin/bagel-34b-v0.2
|
161 |
-
* https://huggingface.co/NousResearch/Nous-Capybara-34B
|
162 |
-
* https://huggingface.co/migtissera/Tess-M-Creative-v1.0
|
163 |
-
* https://huggingface.co/brucethemoose/SUS-Bagel-200K-DARE-Test
|
164 |
-
* https://huggingface.co/Mihaiii/Pallas-0.5
|
165 |
-
* https://huggingface.co/bhenrym14/airoboros-3_1-yi-34b-200k
|
166 |
-
* https://huggingface.co/adamo1139/Yi-34B-200K-AEZAKMI-v2
|
167 |
-
* https://huggingface.co/migtissera/Tess-34B-v1.4
|
168 |
-
* https://huggingface.co/SUSTech/SUS-Chat-34B
|
169 |
-
* https://huggingface.co/jondurbin/bagel-dpo-34b-v0.2
|
170 |
-
* https://huggingface.co/chargoddard/Yi-34B-200K-Llama
|
171 |
-
* https://huggingface.co/chargoddard/Yi-34B-Llama
|
172 |
-
|
173 |
-
|
174 |
-
### Configuration
|
175 |
-
|
176 |
-
The following YAML configuration was used to produce this model:
|
177 |
-
|
178 |
-
```yaml
|
179 |
-
models:
|
180 |
-
- model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
|
181 |
-
# No parameters necessary for base model
|
182 |
-
- model: /home/alpha/Storage/Models/Raw/migtissera_Tess-34B-v1.4
|
183 |
-
parameters:
|
184 |
-
weight: [0.23, 0.125, 0.125, 0.125, 0.125, 0.125]
|
185 |
-
density: 0.59
|
186 |
-
- model: /home/alpha/Models/Raw/Mihaiii_Pallas-0.5
|
187 |
-
parameters:
|
188 |
-
weight: [0.23, 0.125, 0.125, 0.125, 0.125, 0.125]
|
189 |
-
density: 0.59
|
190 |
-
- model: /home/alpha//Storage/Models/Raw/bhenrym14_airoboros-3_1-yi-34b-200k
|
191 |
-
parameters:
|
192 |
-
weight: [0.02, 0.106, 0.106, 0.106, 0.106, 0.106]
|
193 |
-
density: 0.59
|
194 |
-
- model: /home/alpha/Storage/Models/Raw/jondurbin_bagel-34b-v0.2
|
195 |
-
#Only the SFT in the main merge since the DPO version seems to have no long context ability at all
|
196 |
-
parameters:
|
197 |
-
weight: [0.02, 0.100, 0.100, 0.100, 0.100, 0.100]
|
198 |
-
density: 0.4
|
199 |
-
- model: /home/alpha/Storage/Models/Raw/kyujinpy_PlatYi-34B-200k-Q-FastChat
|
200 |
-
parameters:
|
201 |
-
weight: [0.02, 0.100, 0.100, 0.100, 0.100, 0.100]
|
202 |
-
density: 0.59
|
203 |
-
#- model: /home/alpha/Storage/Models/Raw/ehartford_dolphin-2.2-yi-34b-200k
|
204 |
-
# Dolphin 200K seems to be funky according to multiple leaderboards and perplexity tests?
|
205 |
-
# parameters:
|
206 |
-
# weight: 0.15
|
207 |
-
# density: 0.6
|
208 |
-
- model: /home/alpha/Models/Raw/adamo1139_Yi-34B-200K-AEZAKMI-v2
|
209 |
-
parameters:
|
210 |
-
weight: [0.02, 0.110, 0.110, 0.110, 0.110, 0.110]
|
211 |
-
density: 0.59
|
212 |
-
- model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
|
213 |
-
parameters:
|
214 |
-
weight: [0.22, 0.126, 0.126, 0.126, 0.126, 0.126]
|
215 |
-
density: 0.59
|
216 |
-
- model: /home/alpha/Storage/Models/Raw/4kmerge
|
217 |
-
parameters:
|
218 |
-
weight: [0.02, 0.108, 0.108, 0.108, 0.108, 0.108]
|
219 |
-
density: 0.5
|
220 |
-
- model: /home/alpha/Models/Raw/migtissera_Tess-M-Creative-v1.0
|
221 |
-
parameters:
|
222 |
-
weight: [0.22, 0.100, 0.100, 0.100, 0.100, 0.10]
|
223 |
-
density: 0.59
|
224 |
-
merge_method: dare_ties
|
225 |
-
tokenizer_source: union
|
226 |
-
base_model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
|
227 |
-
parameters:
|
228 |
-
int8_mask: true
|
229 |
-
dtype: bfloat16
|
230 |
-
|
231 |
-
```
|
232 |
-
|
233 |
-
The following config was used for the "4kmerge" model:
|
234 |
-
|
235 |
-
```yaml
|
236 |
-
models:
|
237 |
-
- model: /home/alpha/Models/Raw/chargoddard_Yi-34B-Llama
|
238 |
-
# No parameters necessary for base model
|
239 |
-
- model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
|
240 |
-
parameters:
|
241 |
-
weight: 0.5
|
242 |
-
density: 1
|
243 |
-
- model: /home/alpha/Models/Raw/SUSTech_SUS-Chat-34B
|
244 |
-
parameters:
|
245 |
-
weight: 0.2
|
246 |
-
density: 0.12
|
247 |
-
- model: /home/alpha/Models/Raw/jondurbin_bagel-dpo-34b-v0.2
|
248 |
-
parameters:
|
249 |
-
weight: 0.2
|
250 |
-
density: 0.15
|
251 |
-
- model: /home/alpha/Models/Raw/jondurbin_bagel-34b-v0.2
|
252 |
-
parameters:
|
253 |
-
weight: 0.1
|
254 |
-
density: 0.12
|
255 |
-
merge_method: dare_ties
|
256 |
-
tokenizer_source: union
|
257 |
-
base_model: /home/alpha/Models/Raw/chargoddard_Yi-34B-Llama
|
258 |
-
parameters:
|
259 |
-
int8_mask: true
|
260 |
-
dtype: bfloat16
|
261 |
-
```
|
262 |
-
|
263 |
-
|
264 |
-
|
265 |
-
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
266 |
-
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_brucethemoose__Yi-34B-200K-DARE-merge-v7)
|
267 |
-
|
268 |
-
| Metric |Value|
|
269 |
-
|---------------------------------|----:|
|
270 |
-
|Avg. |73.12|
|
271 |
-
|AI2 Reasoning Challenge (25-Shot)|68.09|
|
272 |
-
|HellaSwag (10-Shot) |85.99|
|
273 |
-
|MMLU (5-Shot) |77.30|
|
274 |
-
|TruthfulQA (0-shot) |58.90|
|
275 |
-
|Winogrande (5-shot) |83.11|
|
276 |
-
|GSM8k (5-shot) |65.35|
|
277 |
|
|
|
|
|
|
|
|
|
|
7 |
- mergekit
|
8 |
- merge
|
9 |
- Yi
|
10 |
+
- exl2
|
11 |
+
- exllamav2
|
12 |
license_name: yi-license
|
13 |
license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE
|
14 |
base_model: []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
---
|
|
|
16 |
|
17 |
+
# Yi-34B-200K-DARE-merge-v7-6bpw-exl2
|
18 |
|
19 |
+
Original model:
|
20 |
+
- [brucethemoose/Yi-34B-200K-DARE-megamerge-v7](https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-megamerge-v7)
|
21 |
|
22 |
+
Available quants:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
+
- [coryg89/Yi-34B-200K-DARE-merge-v7-4bpw-exl2](https://huggingface.co/coryg89/Yi-34B-200K-DARE-merge-v7-4bpw-exl2)
|
25 |
+
- [coryg89/Yi-34B-200K-DARE-merge-v7-5bpw-exl2](https://huggingface.co/coryg89/Yi-34B-200K-DARE-merge-v7-5bpw-exl2)
|
26 |
+
- [coryg89/Yi-34B-200K-DARE-merge-v7-6bpw-exl2](https://huggingface.co/coryg89/Yi-34B-200K-DARE-merge-v7-6bpw-exl2)
|
27 |
+
- [coryg89/Yi-34B-200K-DARE-merge-v7-8bpw-exl2](https://huggingface.co/coryg89/Yi-34B-200K-DARE-merge-v7-8bpw-exl2)
|