nisten
/

deepseek-0628-gguf

GGUF

Inference Endpoints

conversational

Model card Files Files and versions Community

nisten commited on Jul 19

Commit

3fdc869

•

1 Parent(s): e885683

Update README.md

Browse files

Files changed (1) hide show

README.md +16 -8

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ base_model: [deepseek-ai/DeepSeek-V2-Chat-0628]
 ### Currently ranked #7 globally on LMSYS Arena Hard! 🏆
->### 🚄 Just download this IQ4XM 131Gb version, it's the one I use myself in prod:
 >🐧 On Linux `sudo apt install -y aria2`
 >
 >🍎 On Mac `brew install aria2`
@@ -69,7 +69,7 @@ perplexity: 94.39 seconds per pass - ETA 4.72 minutes
 Final estimate: PPL = 7.6295 +/- 0.36143
 deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
-size: 131Gb
 perplexity: 59.49 seconds per pass - ETA 2.97 minutes
 [1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620,
 Final estimate: PPL = 5.8620 +/- 0.26853
@@ -135,20 +135,28 @@ aria2c -x 8 -o deepseek-0628-bf16-00011-of-00011.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00011-of-00011.gguf
 ```
 >[!TIP]
->### 🚄 Even more quantizations for the speed demons and size optimizers:
 >
 >🧪 Experimental versions - the q1s and q1m 1 bit ( avg 2.1 bpw and 2.6bpw are suprisingly coherent!)
 ```bash
-# 2-bit IQ2_XXS version (86.5GB total)
 aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf
 aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf
-# Q6K version (200.9GB total)
 aria2c -x 8 -o deepseek-0628-cpu-q6k-00001-of-00005.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00001-of-00005.gguf
@@ -164,7 +172,7 @@ aria2c -x 8 -o deepseek-0628-cpu-q6k-00004-of-00005.gguf \
 aria2c -x 8 -o deepseek-0628-cpu-q6k-00005-of-00005.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00005-of-00005.gguf
-# Q4_0_8_8 faster but dumber version (181.8GB total)
 aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf
@@ -182,7 +190,7 @@ The following 1 bit mixed quant versions are strangely good:
 <figure>
   <img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/Qxx4p2l0prHiScCdL68XK.png" alt="deepseek_0628_cpu-iq1m example response">
-  <figcaption><strong>deepseek_0628_cpu-iq1m (73.27 GB):</strong> example response is strangely good</figcaption>
 </figure>
@@ -196,7 +204,7 @@ aria2c -x 8 -o deepseek_0628_cpu-iq1m-00002-of-00002.gguf \
 ```
 <figure>
   <img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/N0lQ5HAJbNbXIG1MbtB4x.png" alt="deepseek_0628_cpu-iq1s example response">
-  <figcaption><strong>deepseek_0628_cpu-iq1s (58.42 GB):</strong> Even the IQ1_S version (52.7GB total)</figcaption>
 </figure>
 ```bash
 # IQ1_S version (58.42 GB)

 ### Currently ranked #7 globally on LMSYS Arena Hard! 🏆
+>### 🚄 Just download this IQ4XM 132Gb version, it's the one I use myself in prod:
 >🐧 On Linux `sudo apt install -y aria2`
 >
 >🍎 On Mac `brew install aria2`
 Final estimate: PPL = 7.6295 +/- 0.36143
 deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
+size: 132.1 GiB
 perplexity: 59.49 seconds per pass - ETA 2.97 minutes
 [1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620,
 Final estimate: PPL = 5.8620 +/- 0.26853
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00011-of-00011.gguf
 ```
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/e4Bt3dpKKt0CPGxjflSdb.png)
+<figure>
+  <img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/e4Bt3dpKKt0CPGxjflSdb.png" alt="deepseek-0628-bf16 example response">
+  <figcaption><strong>deepseek-0628-bf16 (440GB):</strong> Example response from full bf16 model</figcaption>
+</figure>
 >[!TIP]
+>### 🚄 Even more accelerated download links for other quantizations:
 >
 >🧪 Experimental versions - the q1s and q1m 1 bit ( avg 2.1 bpw and 2.6bpw are suprisingly coherent!)
 ```bash
+# 2-bit IQ2_XXS version (80.6 GiB total)
 aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf
 aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf
+# Q6K version (187.1 GiB total)
 aria2c -x 8 -o deepseek-0628-cpu-q6k-00001-of-00005.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00001-of-00005.gguf
 aria2c -x 8 -o deepseek-0628-cpu-q6k-00005-of-00005.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00005-of-00005.gguf
+# Q4_0_8_8 faster but dumber version (~169.3GB total)
 aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf
 <figure>
   <img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/Qxx4p2l0prHiScCdL68XK.png" alt="deepseek_0628_cpu-iq1m example response">
+  <figcaption><strong>deepseek_0628_cpu-iq1m (73.27 GB):</strong> Mixed 1bit response response is strangely good</figcaption>
 </figure>
 ```
 <figure>
   <img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/N0lQ5HAJbNbXIG1MbtB4x.png" alt="deepseek_0628_cpu-iq1s example response">
+  <figcaption><strong>deepseek_0628_cpu-iq1s (58.42 GB):</strong> Even the smallest IQ1_S version (52.7GB total) is coherent with these custom quants</figcaption>
 </figure>
 ```bash
 # IQ1_S version (58.42 GB)