nisten commited on
Commit
3fdc869
1 Parent(s): e885683

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -8
README.md CHANGED
@@ -7,7 +7,7 @@ base_model: [deepseek-ai/DeepSeek-V2-Chat-0628]
7
  ### Currently ranked #7 globally on LMSYS Arena Hard! 🏆
8
 
9
 
10
- >### 🚄 Just download this IQ4XM 131Gb version, it's the one I use myself in prod:
11
  >🐧 On Linux `sudo apt install -y aria2`
12
  >
13
  >🍎 On Mac `brew install aria2`
@@ -69,7 +69,7 @@ perplexity: 94.39 seconds per pass - ETA 4.72 minutes
69
  Final estimate: PPL = 7.6295 +/- 0.36143
70
 
71
  deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
72
- size: 131Gb
73
  perplexity: 59.49 seconds per pass - ETA 2.97 minutes
74
  [1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620,
75
  Final estimate: PPL = 5.8620 +/- 0.26853
@@ -135,20 +135,28 @@ aria2c -x 8 -o deepseek-0628-bf16-00011-of-00011.gguf \
135
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00011-of-00011.gguf
136
  ```
137
 
 
 
 
 
 
 
 
 
138
  >[!TIP]
139
- >### 🚄 Even more quantizations for the speed demons and size optimizers:
140
  >
141
  >🧪 Experimental versions - the q1s and q1m 1 bit ( avg 2.1 bpw and 2.6bpw are suprisingly coherent!)
142
 
143
  ```bash
144
- # 2-bit IQ2_XXS version (86.5GB total)
145
  aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf \
146
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf
147
 
148
  aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf \
149
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf
150
 
151
- # Q6K version (200.9GB total)
152
  aria2c -x 8 -o deepseek-0628-cpu-q6k-00001-of-00005.gguf \
153
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00001-of-00005.gguf
154
 
@@ -164,7 +172,7 @@ aria2c -x 8 -o deepseek-0628-cpu-q6k-00004-of-00005.gguf \
164
  aria2c -x 8 -o deepseek-0628-cpu-q6k-00005-of-00005.gguf \
165
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00005-of-00005.gguf
166
 
167
- # Q4_0_8_8 faster but dumber version (181.8GB total)
168
  aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf \
169
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf
170
 
@@ -182,7 +190,7 @@ The following 1 bit mixed quant versions are strangely good:
182
 
183
  <figure>
184
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/Qxx4p2l0prHiScCdL68XK.png" alt="deepseek_0628_cpu-iq1m example response">
185
- <figcaption><strong>deepseek_0628_cpu-iq1m (73.27 GB):</strong> example response is strangely good</figcaption>
186
  </figure>
187
 
188
 
@@ -196,7 +204,7 @@ aria2c -x 8 -o deepseek_0628_cpu-iq1m-00002-of-00002.gguf \
196
  ```
197
  <figure>
198
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/N0lQ5HAJbNbXIG1MbtB4x.png" alt="deepseek_0628_cpu-iq1s example response">
199
- <figcaption><strong>deepseek_0628_cpu-iq1s (58.42 GB):</strong> Even the IQ1_S version (52.7GB total)</figcaption>
200
  </figure>
201
  ```bash
202
  # IQ1_S version (58.42 GB)
 
7
  ### Currently ranked #7 globally on LMSYS Arena Hard! 🏆
8
 
9
 
10
+ >### 🚄 Just download this IQ4XM 132Gb version, it's the one I use myself in prod:
11
  >🐧 On Linux `sudo apt install -y aria2`
12
  >
13
  >🍎 On Mac `brew install aria2`
 
69
  Final estimate: PPL = 7.6295 +/- 0.36143
70
 
71
  deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
72
+ size: 132.1 GiB
73
  perplexity: 59.49 seconds per pass - ETA 2.97 minutes
74
  [1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620,
75
  Final estimate: PPL = 5.8620 +/- 0.26853
 
135
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00011-of-00011.gguf
136
  ```
137
 
138
+
139
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/e4Bt3dpKKt0CPGxjflSdb.png)
140
+
141
+ <figure>
142
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/e4Bt3dpKKt0CPGxjflSdb.png" alt="deepseek-0628-bf16 example response">
143
+ <figcaption><strong>deepseek-0628-bf16 (440GB):</strong> Example response from full bf16 model</figcaption>
144
+ </figure>
145
+
146
  >[!TIP]
147
+ >### 🚄 Even more accelerated download links for other quantizations:
148
  >
149
  >🧪 Experimental versions - the q1s and q1m 1 bit ( avg 2.1 bpw and 2.6bpw are suprisingly coherent!)
150
 
151
  ```bash
152
+ # 2-bit IQ2_XXS version (80.6 GiB total)
153
  aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf \
154
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf
155
 
156
  aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf \
157
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf
158
 
159
+ # Q6K version (187.1 GiB total)
160
  aria2c -x 8 -o deepseek-0628-cpu-q6k-00001-of-00005.gguf \
161
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00001-of-00005.gguf
162
 
 
172
  aria2c -x 8 -o deepseek-0628-cpu-q6k-00005-of-00005.gguf \
173
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00005-of-00005.gguf
174
 
175
+ # Q4_0_8_8 faster but dumber version (~169.3GB total)
176
  aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf \
177
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf
178
 
 
190
 
191
  <figure>
192
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/Qxx4p2l0prHiScCdL68XK.png" alt="deepseek_0628_cpu-iq1m example response">
193
+ <figcaption><strong>deepseek_0628_cpu-iq1m (73.27 GB):</strong> Mixed 1bit response response is strangely good</figcaption>
194
  </figure>
195
 
196
 
 
204
  ```
205
  <figure>
206
  <img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/N0lQ5HAJbNbXIG1MbtB4x.png" alt="deepseek_0628_cpu-iq1s example response">
207
+ <figcaption><strong>deepseek_0628_cpu-iq1s (58.42 GB):</strong> Even the smallest IQ1_S version (52.7GB total) is coherent with these custom quants</figcaption>
208
  </figure>
209
  ```bash
210
  # IQ1_S version (58.42 GB)