deepseek-0628-gguf / README.md
nisten's picture
Update README.md
659bee5 verified
|
raw
history blame
10.8 kB
metadata
base_model:
  - deepseek-ai/DeepSeek-V2-Chat-0628

๐Ÿš€ My custom quantizations of DeepSeek-V2-Chat-0628 optimized for CPU inference ๐Ÿ–ฅ๏ธ

Currently ranked #7 globally on LMSYS Arena Hard! ๐Ÿ†

๐Ÿš„ Just download this IQ4XM 131Gb version, it's the one I use myself in prod:

๐Ÿง On Linux: sudo apt install -y aria2

๐ŸŽ On Mac: brew install aria2 You can paste these in all at once or in multiple steps doesnt matter

aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf

aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00002-of-00004.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00002-of-00004.gguf

aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf

aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf

//then to have a commandline conversation interface all you need is:

git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make -j
./llama-cli -m ~/r/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -t 62 --temp 0.4 -co -cnv -i -c 3000 -p "Adopt the persona of a full-stack developer at NASA JPL."

๐Ÿง  This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.

๐Ÿ› ๏ธ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.

๐Ÿ“ No need for file concatenation - just point llama-cli at the first file and watch the magic happen!

๐Ÿ’ป Ready to delve in baby? Here's your command-line spell for interactive mode (prompt.txt is optional, but recommended for maximum sorcery):

./llama-cli --temp 0.4 -m deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -c 32000 -co -cnv -i -f prompt.txt
//PERPLEXITY BENCHMARKS, 
//./llama-perplexity -m ~/r/deepseek_0628_cpu-iq4xm-00001-of-00002.gguf --chunks 12 -f ~/wiki.test.raw
//the 4bit iq4xm gets best perplexity but it's likely just a rounding error

deepseek-0628-bf16-00001-of-00011.gguf
Model size: 440 Gib
perplexity: 735.50 seconds per pass - ETA 36.77 minutes
[1]2.4827,[2]3.3887,[3]2.9470,[4]3.4768,[5]3.9012,[6]4.5128,[7]4.7533,[8]4.9550,[9]5.2863,[10]5.6824,[11]5.7541,[12]5.8734,
Final estimate: PPL = 5.8734 +/- 0.26967

deepseek_0628_cpu-iq1m-00001-of-00002.gguf 
model size       = 73.27 GiB (2.67 BPW) 
perplexity: 96.54 seconds per pass - ETA 4.82 minutes
[1]3.4340,[2]4.5503,[3]4.0118,[4]4.5807,[5]4.9540,[6]5.7353,[7]5.9430,[8]6.1320,[9]6.5690,[10]6.9401,[11]7.0363,[12]7.1857,
Final estimate: PPL = 7.1857 +/- 0.33585

deepseek_0628_cpu_iq1_s-00001-of-00002.gguf 
model size       = 58.42 GiB (2.13 BPW)
perplexity: 94.39 seconds per pass - ETA 4.72 minutes
[1]3.3257,[2]4.7059,[3]4.3868,[4]4.8870,[5]5.3162,[6]6.0753,[7]6.2931,[8]6.5085,[9]6.8913,[10]7.3148,[11]7.4235,[12]7.6295,
Final estimate: PPL = 7.6295 +/- 0.36143

deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
size: 131Gb 
perplexity: 59.49 seconds per pass - ETA 2.97 minutes
[1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620,
Final estimate: PPL = 5.8620 +/- 0.26853

๐Ÿš„ More scripts for accelerated downloads::

# ๐Ÿ‹๏ธ For the nearly lossless Q8_0 version
aria2c -x 8 -o deepseek-0628-q8_0-00001-of-00006.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00001-of-00006.gguf

aria2c -x 8 -o deepseek-0628-q8_0-00002-of-00006.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00002-of-00006.gguf

aria2c -x 8 -o deepseek-0628-q8_0-00003-of-00006.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00003-of-00006.gguf

aria2c -x 8 -o deepseek-0628-q8_0-00004-of-00006.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00004-of-00006.gguf

aria2c -x 8 -o deepseek-0628-q8_0-00005-of-00006.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00005-of-00006.gguf

aria2c -x 8 -o deepseek-0628-q8_0-00006-of-00006.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q8_0-00006-of-00006.gguf
# ๐Ÿง  For the full-brain BF16 version
aria2c -x 8 -o deepseek-0628-bf16-00001-of-00011.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00001-of-00011.gguf

aria2c -x 8 -o deepseek-0628-bf16-00002-of-00011.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00002-of-00011.gguf

aria2c -x 8 -o deepseek-0628-bf16-00003-of-00011.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00003-of-00011.gguf

aria2c -x 8 -o deepseek-0628-bf16-00004-of-00011.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00004-of-00011.gguf

aria2c -x 8 -o deepseek-0628-bf16-00005-of-00011.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00005-of-00011.gguf

aria2c -x 8 -o deepseek-0628-bf16-00006-of-00011.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00006-of-00011.gguf

aria2c -x 8 -o deepseek-0628-bf16-00007-of-00011.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00007-of-00011.gguf

aria2c -x 8 -o deepseek-0628-bf16-00008-of-00011.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00008-of-00011.gguf

aria2c -x 8 -o deepseek-0628-bf16-00009-of-00011.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00009-of-00011.gguf

aria2c -x 8 -o deepseek-0628-bf16-00010-of-00011.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00010-of-00011.gguf

aria2c -x 8 -o deepseek-0628-bf16-00011-of-00011.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-bf16-00011-of-00011.gguf

๐Ÿš„ Even more quantizations for the speed demons and size optimizers:

๐Ÿงช Experimental versions - the q1s and q1m 1 bit ( avg 2.1 bpw and 2.6bpw are suprisingly coherent!)

# 2-bit IQ2_XXS version (86.5GB total)
aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00001-of-00002.gguf

aria2c -x 8 -o deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-2bit-IQ2_XXS-00002-of-00002.gguf

# Q6K version (200.9GB total)
aria2c -x 8 -o deepseek-0628-cpu-q6k-00001-of-00005.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00001-of-00005.gguf

aria2c -x 8 -o deepseek-0628-cpu-q6k-00002-of-00005.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00002-of-00005.gguf

aria2c -x 8 -o deepseek-0628-cpu-q6k-00003-of-00005.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00003-of-00005.gguf

aria2c -x 8 -o deepseek-0628-cpu-q6k-00004-of-00005.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00004-of-00005.gguf

aria2c -x 8 -o deepseek-0628-cpu-q6k-00005-of-00005.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-cpu-q6k-00005-of-00005.gguf

# Q4_0_8_8 faster but dumber version (181.8GB total)
aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00001-of-00004.gguf

aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00002-of-00004.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00002-of-00004.gguf

aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00003-of-00004.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00003-of-00004.gguf

aria2c -x 8 -o deepseek-0628-q4_0_8_8_faster_dumber-00004-of-00004.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek-0628-q4_0_8_8_faster_dumber-00004-of-00004.gguf

# IQ1M version (78.7GB total)
aria2c -x 8 -o deepseek_0628_cpu-iq1m-00001-of-00002.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu-iq1m-00001-of-00002.gguf

aria2c -x 8 -o deepseek_0628_cpu-iq1m-00002-of-00002.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu-iq1m-00002-of-00002.gguf

# IQ1_S version (62.7GB total)
aria2c -x 8 -o deepseek_0628_cpu_iq1_s-00001-of-00002.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_iq1_s-00001-of-00002.gguf

aria2c -x 8 -o deepseek_0628_cpu_iq1_s-00002-of-00002.gguf \
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_iq1_s-00002-of-00002.gguf

๐Ÿ“œ The use of DeepSeek-V2-Chat-0628 model is subject to the DeepSeek Model License. DeepSeek-V2 series supports commercial use. It's a permissive license that only restricts use for military purposes, harming minors, or patent trolling.

๐ŸŒŸ Model Information

DeepSeek-V2-Chat-0628 is the latest and greatest in the DeepSeek family. This AI powerhouse has climbed the LMSYS Chatbot Arena Leaderboard faster than a rocket on steroids:

  • ๐Ÿ† Overall Arena Ranking: #11 global
  • ๐Ÿ’ป Coding Arena Ranking: #3, global
  • ๐Ÿง  Hard Prompts Arena Ranking: #7 global, better than claude opus even in english only hard-prompts image/png Want to seek deeper into this model's ocean of awesomeness? Swim over to the original model card and prepare to have your mind blown! ๐Ÿคฏ

Now go forth and accelerate ๐Ÿš€๐Ÿ’ก

-Nisten