File size: 12,044 Bytes
d17decc 1e349d9 50bf79b 548354d d17decc 124e2e1 3fdc869 7116803 659bee5 7116803 e885683 7116803 d17decc 124e2e1 c10d06b d17decc c10d06b d17decc c10d06b d17decc c10d06b d17decc 124e2e1 5bd29cd 3176ae0 da062f7 3176ae0 5bd29cd da062f7 5bd29cd 3176ae0 da062f7 5bd29cd da062f7 5bd29cd c416301 5bd29cd 6ca9816 d17decc 4062cca d17decc 3fdc869 dcd4f7a 3fdc869 dcd4f7a 4062cca dcd4f7a 3fdc869 dcd4f7a 3fdc869 dcd4f7a 3fdc869 dcd4f7a e885683 3fdc869 e885683 dcd4f7a e885683 dcd4f7a e885683 3fdc869 e885683 4062cca e885683 dcd4f7a 2498189 d17decc 1c08e29 fc19f04 d17decc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
---
base_model: [deepseek-ai/DeepSeek-V2-Chat-0628]
---
# 🚀 My custom quantizations of [DeepSeek-V2-Chat-0628](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat-0628) optimized for CPU inference 🖥️
### Currently ranked #7 globally on LMSYS Arena Hard! 🏆
>### 🚄 Just download this IQ4XM 132Gb version, it's the one I use myself in prod:
>🐧 On Linux `sudo apt install -y aria2`
>
>🍎 On Mac `brew install aria2`
>
>These links will download 9x faster, feel free to paste them all in or one at a time
```verilog
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00002-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00002-of-00004.gguf
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
```
>[!TIP]
>//then to have a commandline conversation interface all you need is:
```bash
git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make -j
./llama-cli -m ~/r/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -t 62 --temp 0.4 -co -cnv -i -c 3000 -p "Adopt the persona of a full-stack developer at NASA JPL."
```
### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.
>[!TIP]
>
>📁 No need for file concatenation - just point llama-cli at the first file and watch the magic happen!
>
>💻 Ready to delve in baby? Here's your command-line spell for interactive mode (prompt.txt is optional, but recommended for maximum sorcery):
>```bash
>./llama-cli --temp 0.4 -m deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -c 32000 -co -cnv -i -f prompt.txt
>```
### Perplexity benchmarks
```verilog
./llama-perplexity -m ~/r/deepseek_0628_cpu-iq4xm-00001-of-00002.gguf --chunks 12 -f ~/wiki.test.raw
```
```verilog
//the 4bit iq4xm gets better perplexity than bf16 lol but it's likely just a rounding error
deepseek-0628-bf16-00001-of-00011.gguf
Model size: 440 Gib
perplexity: 735.50 seconds per pass - ETA 36.77 minutes
[1]2.4827,[2]3.3887,[3]2.9470,[4]3.4768,[5]3.9012,[6]4.5128,[7]4.7533,[8]4.9550,[9]5.2863,[10]5.6824,[11]5.7541,[12]5.8734,
Final estimate: PPL = 5.8734 +/- 0.26967
deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
size: 132.1 GiB
perplexity: 59.49 seconds per pass - ETA 2.97 minutes
[1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620,
Final estimate: PPL = 5.8620 +/- 0.26853
deepseek_0628_cpu-iq1m-00001-of-00002.gguf
model size = 73.27 GiB (2.67 BPW)
perplexity: 96.54 seconds per pass - ETA 4.82 minutes
[1]3.4340,[2]4.5503,[3]4.0118,[4]4.5807,[5]4.9540,[6]5.7353,[7]5.9430,[8]6.1320,[9]6.5690,[10]6.9401,[11]7.0363,[12]7.1857,
Final estimate: PPL = 7.1857 +/- 0.33585
deepseek_0628_cpu_iq1_s-00001-of-00002.gguf
model size = 58.42 GiB (2.13 BPW)
perplexity: 94.39 seconds per pass - ETA 4.72 minutes
[1]3.3257,[2]4.7059,[3]4.3868,[4]4.8870,[5]5.3162,[6]6.0753,[7]6.2931,[8]6.5085,[9]6.8913,[10]7.3148,[11]7.4235,[12]7.6295,
Final estimate: PPL = 7.6295 +/- 0.36143
deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf
model size = 80.58 GiB (2.94 BPW)
[1]2.7202,[2]3.9132,[3]3.5575,[4]4.0150,[5]4.4171,[6]5.0741,[7]5.2683,[8]5.4653,[9]5.8189,[10]6.2432,[11]6.3324,[12]6.4842,
Final estimate: PPL = 6.4842 +/- 0.29700
```
>[!TIP]
>### 🚄 More scripts for accelerated downloads::
>
```bash
# 🏋️ For the nearly lossless Q8_0 version
aria2c -x 8 -o deepseek-0628-q8_0-00001-of-00006.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00001-of-00006.gguf
aria2c -x 8 -o deepseek-0628-q8_0-00002-of-00006.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00002-of-00006.gguf
aria2c -x 8 -o deepseek-0628-q8_0-00003-of-00006.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00003-of-00006.gguf
aria2c -x 8 -o deepseek-0628-q8_0-00004-of-00006.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00004-of-00006.gguf
aria2c -x 8 -o deepseek-0628-q8_0-00005-of-00006.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00005-of-00006.gguf
aria2c -x 8 -o deepseek-0628-q8_0-00006-of-00006.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00006-of-00006.gguf
```
```bash
# 🧠 For the full-brain BF16 version
aria2c -x 8 -o deepseek-0628-bf16-00001-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00001-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00002-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00002-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00003-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00003-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00004-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00004-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00005-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00005-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00006-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00006-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00007-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00007-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00008-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00008-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00009-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00009-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00010-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00010-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00011-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00011-of-00011.gguf
```
<figure>
<img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/e4Bt3dpKKt0CPGxjflSdb.png" alt="deepseek-0628-bf16 example response">
<figcaption><strong>deepseek-0628-bf16 (440GB):</strong> Example response from full bf16 model</figcaption>
</figure>
>[!TIP]
>### 🚄 Even more accelerated download links for other quantizations:
>
>🧪 Experimental versions - the q1s and q1m 1 bit ( avg 2.1 bpw and 2.6bpw are suprisingly coherent!)
>
```bash
# 2-bit IQ2_XXS version (80.6 GiB total)
aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf
aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf
# Q6K version (187.1 GiB total)
aria2c -x 8 -o deepseek-0628-cpu-q6k-00001-of-00005.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00001-of-00005.gguf
aria2c -x 8 -o deepseek-0628-cpu-q6k-00002-of-00005.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00002-of-00005.gguf
aria2c -x 8 -o deepseek-0628-cpu-q6k-00003-of-00005.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00003-of-00005.gguf
aria2c -x 8 -o deepseek-0628-cpu-q6k-00004-of-00005.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00004-of-00005.gguf
aria2c -x 8 -o deepseek-0628-cpu-q6k-00005-of-00005.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00005-of-00005.gguf
# Q4_0_8_8 faster but dumber version (~169.3GB total)
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00002-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00002-of-00004.gguf
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00003-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00003-of-00004.gguf
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00004-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00004-of-00004.gguf
```
The following 1 bit mixed quant versions are strangely good:
<figure>
<img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/Qxx4p2l0prHiScCdL68XK.png" alt="deepseek_0628_cpu-iq1m example response">
<figcaption><strong>deepseek_0628_cpu-iq1m (73.27 GB):</strong> Mixed 1bit response response is strangely good</figcaption>
</figure>
```bash
# IQ1M version (73.27 GB)
aria2c -x 8 -o deepseek_0628_cpu-iq1m-00001-of-00002.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu-iq1m-00001-of-00002.gguf
aria2c -x 8 -o deepseek_0628_cpu-iq1m-00002-of-00002.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu-iq1m-00002-of-00002.gguf
```
<figure>
<img src="https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/N0lQ5HAJbNbXIG1MbtB4x.png" alt="deepseek_0628_cpu-iq1s example response">
<figcaption><strong>deepseek_0628_cpu-iq1s (58.42 GB):</strong> Even the smallest IQ1_S version (52.7GB total) is coherent with these custom quants</figcaption>
</figure>
```bash
# IQ1_S version (58.42 GB)
aria2c -x 8 -o deepseek_0628_cpu_iq1_s-00001-of-00002.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_iq1_s-00001-of-00002.gguf
aria2c -x 8 -o deepseek_0628_cpu_iq1_s-00002-of-00002.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_iq1_s-00002-of-00002.gguf
```
📜 The use of DeepSeek-V2-Chat-0628 model is subject to the [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-MODEL). DeepSeek-V2 series supports commercial use. It's a permissive license that only restricts use for military purposes, harming minors, or patent trolling.
### 🌟 Model Information
DeepSeek-V2-Chat-0628 is the latest and greatest in the DeepSeek family. This AI powerhouse has climbed the LMSYS Chatbot Arena Leaderboard faster than a rocket on steroids:
- 🏆 Overall Arena Ranking: #11 global
- 💻 Coding Arena Ranking: #3, global
- 🧠 Hard Prompts Arena Ranking: #7 global, better than claude opus even in english only hard-prompts
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/rbdug3j6BaeTSmKLDIp39.png)
Want to seek deeper into this model's ocean of weights? Swim over to the [OG model page](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat-0628)
Now go forth and accelerate 🚀💡
-Nisten
|