Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ base_model: [deepseek-ai/DeepSeek-V2-Chat-0628]
|
|
7 |
### Currently ranked #7 globally on LMSYS Arena Hard! 🏆
|
8 |
|
9 |
|
10 |
-
>### 🚄 Just download this IQ4XM
|
11 |
>🐧 On Linux `sudo apt install -y aria2`
|
12 |
>
|
13 |
>🍎 On Mac `brew install aria2`
|
@@ -69,7 +69,7 @@ perplexity: 94.39 seconds per pass - ETA 4.72 minutes
|
|
69 |
Final estimate: PPL = 7.6295 +/- 0.36143
|
70 |
|
71 |
deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
|
72 |
-
size:
|
73 |
perplexity: 59.49 seconds per pass - ETA 2.97 minutes
|
74 |
[1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620,
|
75 |
Final estimate: PPL = 5.8620 +/- 0.26853
|
@@ -135,20 +135,28 @@ aria2c -x 8 -o deepseek-0628-bf16-00011-of-00011.gguf \
|
|
135 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00011-of-00011.gguf
|
136 |
```
|
137 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
138 |
>[!TIP]
|
139 |
-
>### 🚄 Even more
|
140 |
>
|
141 |
>🧪 Experimental versions - the q1s and q1m 1 bit ( avg 2.1 bpw and 2.6bpw are suprisingly coherent!)
|
142 |
|
143 |
```bash
|
144 |
-
# 2-bit IQ2_XXS version (
|
145 |
aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf \
|
146 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf
|
147 |
|
148 |
aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf \
|
149 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf
|
150 |
|
151 |
-
# Q6K version (
|
152 |
aria2c -x 8 -o deepseek-0628-cpu-q6k-00001-of-00005.gguf \
|
153 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00001-of-00005.gguf
|
154 |
|
@@ -164,7 +172,7 @@ aria2c -x 8 -o deepseek-0628-cpu-q6k-00004-of-00005.gguf \
|
|
164 |
aria2c -x 8 -o deepseek-0628-cpu-q6k-00005-of-00005.gguf \
|
165 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00005-of-00005.gguf
|
166 |
|
167 |
-
# Q4_0_8_8 faster but dumber version (
|
168 |
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf \
|
169 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf
|
170 |
|
@@ -182,7 +190,7 @@ The following 1 bit mixed quant versions are strangely good:
|
|
182 |
|
183 |
<figure>
|
184 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/Qxx4p2l0prHiScCdL68XK.png" alt="deepseek_0628_cpu-iq1m example response">
|
185 |
-
<figcaption><strong>deepseek_0628_cpu-iq1m (73.27 GB):</strong>
|
186 |
</figure>
|
187 |
|
188 |
|
@@ -196,7 +204,7 @@ aria2c -x 8 -o deepseek_0628_cpu-iq1m-00002-of-00002.gguf \
|
|
196 |
```
|
197 |
<figure>
|
198 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/N0lQ5HAJbNbXIG1MbtB4x.png" alt="deepseek_0628_cpu-iq1s example response">
|
199 |
-
<figcaption><strong>deepseek_0628_cpu-iq1s (58.42 GB):</strong> Even the IQ1_S version (52.7GB total)</figcaption>
|
200 |
</figure>
|
201 |
```bash
|
202 |
# IQ1_S version (58.42 GB)
|
|
|
7 |
### Currently ranked #7 globally on LMSYS Arena Hard! 🏆
|
8 |
|
9 |
|
10 |
+
>### 🚄 Just download this IQ4XM 132Gb version, it's the one I use myself in prod:
|
11 |
>🐧 On Linux `sudo apt install -y aria2`
|
12 |
>
|
13 |
>🍎 On Mac `brew install aria2`
|
|
|
69 |
Final estimate: PPL = 7.6295 +/- 0.36143
|
70 |
|
71 |
deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
|
72 |
+
size: 132.1 GiB
|
73 |
perplexity: 59.49 seconds per pass - ETA 2.97 minutes
|
74 |
[1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620,
|
75 |
Final estimate: PPL = 5.8620 +/- 0.26853
|
|
|
135 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00011-of-00011.gguf
|
136 |
```
|
137 |
|
138 |
+
|
139 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/e4Bt3dpKKt0CPGxjflSdb.png)
|
140 |
+
|
141 |
+
<figure>
|
142 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/e4Bt3dpKKt0CPGxjflSdb.png" alt="deepseek-0628-bf16 example response">
|
143 |
+
<figcaption><strong>deepseek-0628-bf16 (440GB):</strong> Example response from full bf16 model</figcaption>
|
144 |
+
</figure>
|
145 |
+
|
146 |
>[!TIP]
|
147 |
+
>### 🚄 Even more accelerated download links for other quantizations:
|
148 |
>
|
149 |
>🧪 Experimental versions - the q1s and q1m 1 bit ( avg 2.1 bpw and 2.6bpw are suprisingly coherent!)
|
150 |
|
151 |
```bash
|
152 |
+
# 2-bit IQ2_XXS version (80.6 GiB total)
|
153 |
aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf \
|
154 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf
|
155 |
|
156 |
aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf \
|
157 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf
|
158 |
|
159 |
+
# Q6K version (187.1 GiB total)
|
160 |
aria2c -x 8 -o deepseek-0628-cpu-q6k-00001-of-00005.gguf \
|
161 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00001-of-00005.gguf
|
162 |
|
|
|
172 |
aria2c -x 8 -o deepseek-0628-cpu-q6k-00005-of-00005.gguf \
|
173 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00005-of-00005.gguf
|
174 |
|
175 |
+
# Q4_0_8_8 faster but dumber version (~169.3GB total)
|
176 |
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf \
|
177 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf
|
178 |
|
|
|
190 |
|
191 |
<figure>
|
192 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/Qxx4p2l0prHiScCdL68XK.png" alt="deepseek_0628_cpu-iq1m example response">
|
193 |
+
<figcaption><strong>deepseek_0628_cpu-iq1m (73.27 GB):</strong> Mixed 1bit response response is strangely good</figcaption>
|
194 |
</figure>
|
195 |
|
196 |
|
|
|
204 |
```
|
205 |
<figure>
|
206 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/N0lQ5HAJbNbXIG1MbtB4x.png" alt="deepseek_0628_cpu-iq1s example response">
|
207 |
+
<figcaption><strong>deepseek_0628_cpu-iq1s (58.42 GB):</strong> Even the smallest IQ1_S version (52.7GB total) is coherent with these custom quants</figcaption>
|
208 |
</figure>
|
209 |
```bash
|
210 |
# IQ1_S version (58.42 GB)
|