nisten
/

deepseek-0628-gguf

GGUF

Model card Files Files and versions Community

nisten commited on Jul 19

Commit

5bd29cd

•

1 Parent(s): 6bc5aa8

Update README.md

Browse files

Files changed (1) hide show

README.md +47 -26

README.md CHANGED Viewed

@@ -4,40 +4,16 @@ base_model: [deepseek-ai/DeepSeek-V2-Chat-0628]
 #### 🚀 Custom quantizations of DeepSeek-V2-Chat-0628 supercharged for CPU inference of currently the #7 model globally on lmsys arena hard! 🖥️
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/rbdug3j6BaeTSmKLDIp39.png)
-### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
-### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.
->[!TIP]
->🔥 The following 4-bit version is my personal go-to, delivering jaw-dropping performance on ARM cores.
->
->📁 No need for file concatenation - just point llama-cli at the first file and watch the magic happen!
->
->💻 Ready to delve in baby? Here's your command-line spell for interactive mode (prompt.txt is optional, but recommended for maximum sorcery):
->```bash
->./llama-cli --temp 0.4 -m deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -c 32000 -co -cnv -i -f prompt.txt
->```
-```verilog
-deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
-deepseek_0628_cpu_optimized_iq4xm-00002-of-00004.gguf
-deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf
-deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
-```
 >[!TIP]
->### 🚄 Want to download faster than a caffeinated thirsty llama? Here's how:
 >
 >🐧 On Linux: `sudo apt install -y aria2`
 >🍎 On Mac: `brew install aria2`
 >
-```bash
-sudo apt install -y aria2
-```
 ```bash
-# 🚀 For the BEST PERFORMANCE / SPEED QUANT THAT I PERSONALLY USE
 aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
@@ -50,6 +26,51 @@ aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf \
 aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
 ```
 ```bash
 # 🏋️ For the nearly lossless Q8_0 version
 aria2c -x 8 -o deepseek-0628-q8_0-00001-of-00006.gguf \

 #### 🚀 Custom quantizations of DeepSeek-V2-Chat-0628 supercharged for CPU inference of currently the #7 model globally on lmsys arena hard! 🖥️
 >[!TIP]
+>### 🚄 Just download this quant IQ4XM custom quant, it's the one I personally use:
 >
 >🐧 On Linux: `sudo apt install -y aria2`
+>
 >🍎 On Mac: `brew install aria2`
 >
 ```bash
 aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
 aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
 ```
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/rbdug3j6BaeTSmKLDIp39.png)
+### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
+### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.
+>[!TIP]
+>
+>📁 No need for file concatenation - just point llama-cli at the first file and watch the magic happen!
+>
+>💻 Ready to delve in baby? Here's your command-line spell for interactive mode (prompt.txt is optional, but recommended for maximum sorcery):
+>```bash
+>./llama-cli --temp 0.4 -m deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -c 32000 -co -cnv -i -f prompt.txt
+>```
+```verilog
+//PERPLEXITY BENCHMARKS,
+//deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
+//the 4bit iq4xm gets best perplexity but it's likely just a rounding error
+./llama-perplexity -m ~/r/deepseek_0628_cpu-iq4xm-00001-of-00002.gguf --chunks 12 -f ~/wiki.test.raw  deepseek-0628-bf16-00001-of-00011.gguf
+Model size: 440 Gib
+perplexity: 735.50 seconds per pass - ETA 36.77 minutes
+[1]2.4827,[2]3.3887,[3]2.9470,[4]3.4768,[5]3.9012,[6]4.5128,[7]4.7533,[8]4.9550,[9]5.2863,[10]5.6824,[11]5.7541,[12]5.8734,
+Final estimate: PPL = 5.8734 +/- 0.26967
+deepseek_0628_cpu-iq1m-00001-of-00002.gguf
+model size       = 73.27 GiB (2.67 BPW)
+perplexity: 96.54 seconds per pass - ETA 4.82 minutes
+[1]3.4340,[2]4.5503,[3]4.0118,[4]4.5807,[5]4.9540,[6]5.7353,[7]5.9430,[8]6.1320,[9]6.5690,[10]6.9401,[11]7.0363,[12]7.1857,
+Final estimate: PPL = 7.1857 +/- 0.33585
+deepseek_0628_cpu_iq1_s-00001-of-00002.gguf
+model size       = 58.42 GiB (2.13 BPW)
+perplexity: 94.39 seconds per pass - ETA 4.72 minutes
+[1]3.3257,[2]4.7059,[3]4.3868,[4]4.8870,[5]5.3162,[6]6.0753,[7]6.2931,[8]6.5085,[9]6.8913,[10]7.3148,[11]7.4235,[12]7.6295,
+Final estimate: PPL = 7.6295 +/- 0.36143
+deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
+size: 131Gb
+perplexity: 59.49 seconds per pass - ETA 2.97 minutes
+[1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620,
+Final estimate: PPL = 5.8620 +/- 0.26853
+```
 ```bash
 # 🏋️ For the nearly lossless Q8_0 version
 aria2c -x 8 -o deepseek-0628-q8_0-00001-of-00006.gguf \