File size: 10,685 Bytes
d17decc 1e349d9 50bf79b 548354d d17decc 124e2e1 1e349d9 d17decc 5bd29cd d17decc 124e2e1 c10d06b d17decc c10d06b d17decc c10d06b d17decc c10d06b d17decc 124e2e1 5bd29cd da062f7 5bd29cd 6a8facf 5bd29cd da062f7 5bd29cd da062f7 5bd29cd da062f7 5bd29cd da062f7 5bd29cd 6ca9816 d17decc dcd4f7a 2498189 d17decc 1c08e29 d17decc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
---
base_model: [deepseek-ai/DeepSeek-V2-Chat-0628]
---
# 🚀 My custom quantizations of [DeepSeek-V2-Chat-0628](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat-0628) optimized for CPU inference 🖥️
### Currently ranked #7 globally on LMSYS Arena Hard! 🏆
>### 🚄 Just download this IQ4XM 131Gb version, it's the one I use myself in prod:
>🐧 On Linux: `sudo apt install -y aria2`
>
>🍎 On Mac: `brew install aria2`
```verilog
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00002-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00002-of-00004.gguf
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
```
>[!TIP]
>//then to have a commandline conversation interface all you need is:
```bash
git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make -j
./llama-cli -m ~/r/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -t 62 --temp 0.4 -co -cnv -i -c 3000 -p "Adopt the persona of a full-stack developer at NASA JPL."
```
### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.
>[!TIP]
>
>📁 No need for file concatenation - just point llama-cli at the first file and watch the magic happen!
>
>💻 Ready to delve in baby? Here's your command-line spell for interactive mode (prompt.txt is optional, but recommended for maximum sorcery):
>```bash
>./llama-cli --temp 0.4 -m deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -c 32000 -co -cnv -i -f prompt.txt
>```
```verilog
//PERPLEXITY BENCHMARKS,
//./llama-perplexity -m ~/r/deepseek_0628_cpu-iq4xm-00001-of-00002.gguf --chunks 12 -f ~/wiki.test.raw
//the 4bit iq4xm gets best perplexity but it's likely just a rounding error
deepseek-0628-bf16-00001-of-00011.gguf
Model size: 440 Gib
perplexity: 735.50 seconds per pass - ETA 36.77 minutes
[1]2.4827,[2]3.3887,[3]2.9470,[4]3.4768,[5]3.9012,[6]4.5128,[7]4.7533,[8]4.9550,[9]5.2863,[10]5.6824,[11]5.7541,[12]5.8734,
Final estimate: PPL = 5.8734 +/- 0.26967
deepseek_0628_cpu-iq1m-00001-of-00002.gguf
model size = 73.27 GiB (2.67 BPW)
perplexity: 96.54 seconds per pass - ETA 4.82 minutes
[1]3.4340,[2]4.5503,[3]4.0118,[4]4.5807,[5]4.9540,[6]5.7353,[7]5.9430,[8]6.1320,[9]6.5690,[10]6.9401,[11]7.0363,[12]7.1857,
Final estimate: PPL = 7.1857 +/- 0.33585
deepseek_0628_cpu_iq1_s-00001-of-00002.gguf
model size = 58.42 GiB (2.13 BPW)
perplexity: 94.39 seconds per pass - ETA 4.72 minutes
[1]3.3257,[2]4.7059,[3]4.3868,[4]4.8870,[5]5.3162,[6]6.0753,[7]6.2931,[8]6.5085,[9]6.8913,[10]7.3148,[11]7.4235,[12]7.6295,
Final estimate: PPL = 7.6295 +/- 0.36143
deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
size: 131Gb
perplexity: 59.49 seconds per pass - ETA 2.97 minutes
[1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620,
Final estimate: PPL = 5.8620 +/- 0.26853
```
>[!TIP]
>### 🚄 More scripts for accelerated downloads::
>
```bash
# 🏋️ For the nearly lossless Q8_0 version
aria2c -x 8 -o deepseek-0628-q8_0-00001-of-00006.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00001-of-00006.gguf
aria2c -x 8 -o deepseek-0628-q8_0-00002-of-00006.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00002-of-00006.gguf
aria2c -x 8 -o deepseek-0628-q8_0-00003-of-00006.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00003-of-00006.gguf
aria2c -x 8 -o deepseek-0628-q8_0-00004-of-00006.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00004-of-00006.gguf
aria2c -x 8 -o deepseek-0628-q8_0-00005-of-00006.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00005-of-00006.gguf
aria2c -x 8 -o deepseek-0628-q8_0-00006-of-00006.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00006-of-00006.gguf
```
```bash
# 🧠 For the full-brain BF16 version
aria2c -x 8 -o deepseek-0628-bf16-00001-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00001-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00002-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00002-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00003-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00003-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00004-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00004-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00005-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00005-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00006-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00006-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00007-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00007-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00008-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00008-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00009-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00009-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00010-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00010-of-00011.gguf
aria2c -x 8 -o deepseek-0628-bf16-00011-of-00011.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00011-of-00011.gguf
```
>[!TIP]
>### 🚄 Even more quantizations for the speed demons and size optimizers:
>
>🧪 Experimental versions - the q1s and q1m 1 bit ( avg 2.1 bpw and 2.6bpw are suprisingly coherent!)
```bash
# 2-bit IQ2_XXS version (86.5GB total)
aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf
aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf
# Q6K version (200.9GB total)
aria2c -x 8 -o deepseek-0628-cpu-q6k-00001-of-00005.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00001-of-00005.gguf
aria2c -x 8 -o deepseek-0628-cpu-q6k-00002-of-00005.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00002-of-00005.gguf
aria2c -x 8 -o deepseek-0628-cpu-q6k-00003-of-00005.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00003-of-00005.gguf
aria2c -x 8 -o deepseek-0628-cpu-q6k-00004-of-00005.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00004-of-00005.gguf
aria2c -x 8 -o deepseek-0628-cpu-q6k-00005-of-00005.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00005-of-00005.gguf
# Q4_0_8_8 faster but dumber version (181.8GB total)
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00002-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00002-of-00004.gguf
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00003-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00003-of-00004.gguf
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00004-of-00004.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00004-of-00004.gguf
# IQ1M version (78.7GB total)
aria2c -x 8 -o deepseek_0628_cpu-iq1m-00001-of-00002.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu-iq1m-00001-of-00002.gguf
aria2c -x 8 -o deepseek_0628_cpu-iq1m-00002-of-00002.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu-iq1m-00002-of-00002.gguf
# IQ1_S version (62.7GB total)
aria2c -x 8 -o deepseek_0628_cpu_iq1_s-00001-of-00002.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_iq1_s-00001-of-00002.gguf
aria2c -x 8 -o deepseek_0628_cpu_iq1_s-00002-of-00002.gguf \
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_iq1_s-00002-of-00002.gguf
```
📜 The use of DeepSeek-V2-Chat-0628 model is subject to the [DeepSeek Model License](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-MODEL). DeepSeek-V2 series supports commercial use. It's a permissive license that only restricts use for military purposes, harming minors, or patent trolling.
### 🌟 Model Information
DeepSeek-V2-Chat-0628 is the latest and greatest in the DeepSeek family. This AI powerhouse has climbed the LMSYS Chatbot Arena Leaderboard faster than a rocket on steroids:
- 🏆 Overall Arena Ranking: #11 global
- 💻 Coding Arena Ranking: #3, global
- 🧠 Hard Prompts Arena Ranking: #7 global, better than claude opus even in english only hard-prompts
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/rbdug3j6BaeTSmKLDIp39.png)
Want to seek deeper into this model's ocean of awesomeness? Swim over to the [original model card](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat-0628) and prepare to have your mind blown! 🤯
Now go forth and accelerate 🚀💡
-Nisten
|