coryg89 commited on
Commit
bec63a5
1 Parent(s): 0cc268f

Update README

Browse files
Files changed (1) hide show
  1. README.md +10 -260
README.md CHANGED
@@ -7,271 +7,21 @@ tags:
7
  - mergekit
8
  - merge
9
  - Yi
 
 
10
  license_name: yi-license
11
  license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE
12
  base_model: []
13
- model-index:
14
- - name: Yi-34B-200K-DARE-merge-v7
15
- results:
16
- - task:
17
- type: text-generation
18
- name: Text Generation
19
- dataset:
20
- name: AI2 Reasoning Challenge (25-Shot)
21
- type: ai2_arc
22
- config: ARC-Challenge
23
- split: test
24
- args:
25
- num_few_shot: 25
26
- metrics:
27
- - type: acc_norm
28
- value: 68.09
29
- name: normalized accuracy
30
- source:
31
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/Yi-34B-200K-DARE-merge-v7
32
- name: Open LLM Leaderboard
33
- - task:
34
- type: text-generation
35
- name: Text Generation
36
- dataset:
37
- name: HellaSwag (10-Shot)
38
- type: hellaswag
39
- split: validation
40
- args:
41
- num_few_shot: 10
42
- metrics:
43
- - type: acc_norm
44
- value: 85.99
45
- name: normalized accuracy
46
- source:
47
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/Yi-34B-200K-DARE-merge-v7
48
- name: Open LLM Leaderboard
49
- - task:
50
- type: text-generation
51
- name: Text Generation
52
- dataset:
53
- name: MMLU (5-Shot)
54
- type: cais/mmlu
55
- config: all
56
- split: test
57
- args:
58
- num_few_shot: 5
59
- metrics:
60
- - type: acc
61
- value: 77.3
62
- name: accuracy
63
- source:
64
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/Yi-34B-200K-DARE-merge-v7
65
- name: Open LLM Leaderboard
66
- - task:
67
- type: text-generation
68
- name: Text Generation
69
- dataset:
70
- name: TruthfulQA (0-shot)
71
- type: truthful_qa
72
- config: multiple_choice
73
- split: validation
74
- args:
75
- num_few_shot: 0
76
- metrics:
77
- - type: mc2
78
- value: 58.9
79
- source:
80
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/Yi-34B-200K-DARE-merge-v7
81
- name: Open LLM Leaderboard
82
- - task:
83
- type: text-generation
84
- name: Text Generation
85
- dataset:
86
- name: Winogrande (5-shot)
87
- type: winogrande
88
- config: winogrande_xl
89
- split: validation
90
- args:
91
- num_few_shot: 5
92
- metrics:
93
- - type: acc
94
- value: 83.11
95
- name: accuracy
96
- source:
97
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/Yi-34B-200K-DARE-merge-v7
98
- name: Open LLM Leaderboard
99
- - task:
100
- type: text-generation
101
- name: Text Generation
102
- dataset:
103
- name: GSM8k (5-shot)
104
- type: gsm8k
105
- config: main
106
- split: test
107
- args:
108
- num_few_shot: 5
109
- metrics:
110
- - type: acc
111
- value: 65.35
112
- name: accuracy
113
- source:
114
- url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=brucethemoose/Yi-34B-200K-DARE-merge-v7
115
- name: Open LLM Leaderboard
116
  ---
117
- # Possibly made obsolete by: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-megamerge-v8
118
 
 
119
 
 
 
120
 
121
- # Yi 34B 200K DARE Merge v7
122
-
123
- A merge of several Yi 34B 200K models using the new DARE Ties method via mergekit. The goal is to create a merge model that excels at 32K+ context performance.
124
-
125
- ## Prompt template: Orca-Vicuna
126
- ```
127
- SYSTEM: {system_message}
128
- USER: {prompt}
129
- ASSISTANT:
130
- ```
131
- It might recognize ChatML, and possibly Alpaca-like formats. Raw prompting as described here is also effective: https://old.reddit.com/r/LocalLLaMA/comments/18zqy4s/the_secret_to_writing_quality_stories_with_llms/
132
-
133
-
134
-
135
- ## Running
136
- Being a Yi model, try running a lower temperature with 0.02-0.06 MinP, a little repetition penalty, maybe mirostat with a low tau, and no other samplers. Yi tends to run "hot" by default, and it really needs a low temperature + MinP to cull the huge vocabulary.
137
-
138
- 24GB GPUs can efficiently run Yi-34B-200K models at **45K-90K context** with exllamav2, and performant UIs like [exui](https://github.com/turboderp/exui). I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/). 16GB GPUs can still run the high context with aggressive quantization.
139
-
140
- To load/train this in full-context backends like transformers, you *must* change `max_position_embeddings` in config.json to a lower value than 200,000, otherwise you will OOM! I do not recommend running high context without context-efficient backends like exllamav2 or unsloth.
141
-
142
-
143
- ## Testing Notes
144
-
145
- See: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-merge-v5#testing-notes
146
-
147
- A "4k" merge model was created to try and extend the context of SUS Chat and DPO-bagel before adding them to the merge: https://huggingface.co/brucethemoose/SUS-Bagel-200K-DARE-Test
148
-
149
- In addition, the weight gradients are biased towards Vicuna-format models in the first few layers to try and "emphasize" the Orca-Vicuna prompt template. How sucessful this is remains to be seen.
150
-
151
-
152
- ### Merge Method
153
-
154
- This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama as a base.
155
-
156
- ### Models Merged
157
-
158
- The following models were included in the merge:
159
- * https://huggingface.co/kyujinpy/PlatYi-34B-200k-Q-FastChat
160
- * https://huggingface.co/jondurbin/bagel-34b-v0.2
161
- * https://huggingface.co/NousResearch/Nous-Capybara-34B
162
- * https://huggingface.co/migtissera/Tess-M-Creative-v1.0
163
- * https://huggingface.co/brucethemoose/SUS-Bagel-200K-DARE-Test
164
- * https://huggingface.co/Mihaiii/Pallas-0.5
165
- * https://huggingface.co/bhenrym14/airoboros-3_1-yi-34b-200k
166
- * https://huggingface.co/adamo1139/Yi-34B-200K-AEZAKMI-v2
167
- * https://huggingface.co/migtissera/Tess-34B-v1.4
168
- * https://huggingface.co/SUSTech/SUS-Chat-34B
169
- * https://huggingface.co/jondurbin/bagel-dpo-34b-v0.2
170
- * https://huggingface.co/chargoddard/Yi-34B-200K-Llama
171
- * https://huggingface.co/chargoddard/Yi-34B-Llama
172
-
173
-
174
- ### Configuration
175
-
176
- The following YAML configuration was used to produce this model:
177
-
178
- ```yaml
179
- models:
180
- - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
181
- # No parameters necessary for base model
182
- - model: /home/alpha/Storage/Models/Raw/migtissera_Tess-34B-v1.4
183
- parameters:
184
- weight: [0.23, 0.125, 0.125, 0.125, 0.125, 0.125]
185
- density: 0.59
186
- - model: /home/alpha/Models/Raw/Mihaiii_Pallas-0.5
187
- parameters:
188
- weight: [0.23, 0.125, 0.125, 0.125, 0.125, 0.125]
189
- density: 0.59
190
- - model: /home/alpha//Storage/Models/Raw/bhenrym14_airoboros-3_1-yi-34b-200k
191
- parameters:
192
- weight: [0.02, 0.106, 0.106, 0.106, 0.106, 0.106]
193
- density: 0.59
194
- - model: /home/alpha/Storage/Models/Raw/jondurbin_bagel-34b-v0.2
195
- #Only the SFT in the main merge since the DPO version seems to have no long context ability at all
196
- parameters:
197
- weight: [0.02, 0.100, 0.100, 0.100, 0.100, 0.100]
198
- density: 0.4
199
- - model: /home/alpha/Storage/Models/Raw/kyujinpy_PlatYi-34B-200k-Q-FastChat
200
- parameters:
201
- weight: [0.02, 0.100, 0.100, 0.100, 0.100, 0.100]
202
- density: 0.59
203
- #- model: /home/alpha/Storage/Models/Raw/ehartford_dolphin-2.2-yi-34b-200k
204
- # Dolphin 200K seems to be funky according to multiple leaderboards and perplexity tests?
205
- # parameters:
206
- # weight: 0.15
207
- # density: 0.6
208
- - model: /home/alpha/Models/Raw/adamo1139_Yi-34B-200K-AEZAKMI-v2
209
- parameters:
210
- weight: [0.02, 0.110, 0.110, 0.110, 0.110, 0.110]
211
- density: 0.59
212
- - model: /home/alpha/Storage/Models/Raw/Nous-Capybara-34B
213
- parameters:
214
- weight: [0.22, 0.126, 0.126, 0.126, 0.126, 0.126]
215
- density: 0.59
216
- - model: /home/alpha/Storage/Models/Raw/4kmerge
217
- parameters:
218
- weight: [0.02, 0.108, 0.108, 0.108, 0.108, 0.108]
219
- density: 0.5
220
- - model: /home/alpha/Models/Raw/migtissera_Tess-M-Creative-v1.0
221
- parameters:
222
- weight: [0.22, 0.100, 0.100, 0.100, 0.100, 0.10]
223
- density: 0.59
224
- merge_method: dare_ties
225
- tokenizer_source: union
226
- base_model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
227
- parameters:
228
- int8_mask: true
229
- dtype: bfloat16
230
-
231
- ```
232
-
233
- The following config was used for the "4kmerge" model:
234
-
235
- ```yaml
236
- models:
237
- - model: /home/alpha/Models/Raw/chargoddard_Yi-34B-Llama
238
- # No parameters necessary for base model
239
- - model: /home/alpha/Storage/Models/Raw/chargoddard_Yi-34B-200K-Llama
240
- parameters:
241
- weight: 0.5
242
- density: 1
243
- - model: /home/alpha/Models/Raw/SUSTech_SUS-Chat-34B
244
- parameters:
245
- weight: 0.2
246
- density: 0.12
247
- - model: /home/alpha/Models/Raw/jondurbin_bagel-dpo-34b-v0.2
248
- parameters:
249
- weight: 0.2
250
- density: 0.15
251
- - model: /home/alpha/Models/Raw/jondurbin_bagel-34b-v0.2
252
- parameters:
253
- weight: 0.1
254
- density: 0.12
255
- merge_method: dare_ties
256
- tokenizer_source: union
257
- base_model: /home/alpha/Models/Raw/chargoddard_Yi-34B-Llama
258
- parameters:
259
- int8_mask: true
260
- dtype: bfloat16
261
- ```
262
-
263
-
264
-
265
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
266
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_brucethemoose__Yi-34B-200K-DARE-merge-v7)
267
-
268
- | Metric |Value|
269
- |---------------------------------|----:|
270
- |Avg. |73.12|
271
- |AI2 Reasoning Challenge (25-Shot)|68.09|
272
- |HellaSwag (10-Shot) |85.99|
273
- |MMLU (5-Shot) |77.30|
274
- |TruthfulQA (0-shot) |58.90|
275
- |Winogrande (5-shot) |83.11|
276
- |GSM8k (5-shot) |65.35|
277
 
 
 
 
 
 
7
  - mergekit
8
  - merge
9
  - Yi
10
+ - exl2
11
+ - exllamav2
12
  license_name: yi-license
13
  license_link: https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE
14
  base_model: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
 
16
 
17
+ # Yi-34B-200K-DARE-merge-v7-6bpw-exl2
18
 
19
+ Original model:
20
+ - [brucethemoose/Yi-34B-200K-DARE-megamerge-v7](https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-megamerge-v7)
21
 
22
+ Available quants:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ - [coryg89/Yi-34B-200K-DARE-merge-v7-4bpw-exl2](https://huggingface.co/coryg89/Yi-34B-200K-DARE-merge-v7-4bpw-exl2)
25
+ - [coryg89/Yi-34B-200K-DARE-merge-v7-5bpw-exl2](https://huggingface.co/coryg89/Yi-34B-200K-DARE-merge-v7-5bpw-exl2)
26
+ - [coryg89/Yi-34B-200K-DARE-merge-v7-6bpw-exl2](https://huggingface.co/coryg89/Yi-34B-200K-DARE-merge-v7-6bpw-exl2)
27
+ - [coryg89/Yi-34B-200K-DARE-merge-v7-8bpw-exl2](https://huggingface.co/coryg89/Yi-34B-200K-DARE-merge-v7-8bpw-exl2)