Update README.md
Browse files
README.md
CHANGED
@@ -4,40 +4,16 @@ base_model: [deepseek-ai/DeepSeek-V2-Chat-0628]
|
|
4 |
|
5 |
#### 🚀 Custom quantizations of DeepSeek-V2-Chat-0628 supercharged for CPU inference of currently the #7 model globally on lmsys arena hard! 🖥️
|
6 |
|
7 |
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/rbdug3j6BaeTSmKLDIp39.png)
|
8 |
-
|
9 |
-
### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
|
10 |
-
### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.
|
11 |
-
|
12 |
-
>[!TIP]
|
13 |
-
>🔥 The following 4-bit version is my personal go-to, delivering jaw-dropping performance on ARM cores.
|
14 |
-
>
|
15 |
-
>📁 No need for file concatenation - just point llama-cli at the first file and watch the magic happen!
|
16 |
-
>
|
17 |
-
>💻 Ready to delve in baby? Here's your command-line spell for interactive mode (prompt.txt is optional, but recommended for maximum sorcery):
|
18 |
-
>```bash
|
19 |
-
>./llama-cli --temp 0.4 -m deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -c 32000 -co -cnv -i -f prompt.txt
|
20 |
-
>```
|
21 |
-
|
22 |
-
```verilog
|
23 |
-
deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
|
24 |
-
deepseek_0628_cpu_optimized_iq4xm-00002-of-00004.gguf
|
25 |
-
deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf
|
26 |
-
deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
|
27 |
-
```
|
28 |
|
29 |
>[!TIP]
|
30 |
-
>### 🚄
|
31 |
>
|
32 |
>🐧 On Linux: `sudo apt install -y aria2`
|
|
|
33 |
>🍎 On Mac: `brew install aria2`
|
34 |
>
|
35 |
-
```bash
|
36 |
-
sudo apt install -y aria2
|
37 |
-
```
|
38 |
|
39 |
```bash
|
40 |
-
# 🚀 For the BEST PERFORMANCE / SPEED QUANT THAT I PERSONALLY USE
|
41 |
aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \
|
42 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
|
43 |
|
@@ -50,6 +26,51 @@ aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf \
|
|
50 |
aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \
|
51 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
|
52 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
```bash
|
54 |
# 🏋️ For the nearly lossless Q8_0 version
|
55 |
aria2c -x 8 -o deepseek-0628-q8_0-00001-of-00006.gguf \
|
|
|
4 |
|
5 |
#### 🚀 Custom quantizations of DeepSeek-V2-Chat-0628 supercharged for CPU inference of currently the #7 model globally on lmsys arena hard! 🖥️
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
|
8 |
>[!TIP]
|
9 |
+
>### 🚄 Just download this quant IQ4XM custom quant, it's the one I personally use:
|
10 |
>
|
11 |
>🐧 On Linux: `sudo apt install -y aria2`
|
12 |
+
>
|
13 |
>🍎 On Mac: `brew install aria2`
|
14 |
>
|
|
|
|
|
|
|
15 |
|
16 |
```bash
|
|
|
17 |
aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \
|
18 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
|
19 |
|
|
|
26 |
aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \
|
27 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
|
28 |
```
|
29 |
+
|
30 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/rbdug3j6BaeTSmKLDIp39.png)
|
31 |
+
|
32 |
+
### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
|
33 |
+
### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.
|
34 |
+
|
35 |
+
>[!TIP]
|
36 |
+
>
|
37 |
+
>📁 No need for file concatenation - just point llama-cli at the first file and watch the magic happen!
|
38 |
+
>
|
39 |
+
>💻 Ready to delve in baby? Here's your command-line spell for interactive mode (prompt.txt is optional, but recommended for maximum sorcery):
|
40 |
+
>```bash
|
41 |
+
>./llama-cli --temp 0.4 -m deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -c 32000 -co -cnv -i -f prompt.txt
|
42 |
+
>```
|
43 |
+
|
44 |
+
```verilog
|
45 |
+
//PERPLEXITY BENCHMARKS,
|
46 |
+
//deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
|
47 |
+
//the 4bit iq4xm gets best perplexity but it's likely just a rounding error
|
48 |
+
|
49 |
+
./llama-perplexity -m ~/r/deepseek_0628_cpu-iq4xm-00001-of-00002.gguf --chunks 12 -f ~/wiki.test.raw
deepseek-0628-bf16-00001-of-00011.gguf
|
50 |
+
Model size: 440 Gib
|
51 |
+
perplexity: 735.50 seconds per pass - ETA 36.77 minutes
|
52 |
+
[1]2.4827,[2]3.3887,[3]2.9470,[4]3.4768,[5]3.9012,[6]4.5128,[7]4.7533,[8]4.9550,[9]5.2863,[10]5.6824,[11]5.7541,[12]5.8734,
|
53 |
+
Final estimate: PPL = 5.8734 +/- 0.26967
|
54 |
+
|
55 |
+
deepseek_0628_cpu-iq1m-00001-of-00002.gguf
|
56 |
+
model size = 73.27 GiB (2.67 BPW)
|
57 |
+
perplexity: 96.54 seconds per pass - ETA 4.82 minutes
|
58 |
+
[1]3.4340,[2]4.5503,[3]4.0118,[4]4.5807,[5]4.9540,[6]5.7353,[7]5.9430,[8]6.1320,[9]6.5690,[10]6.9401,[11]7.0363,[12]7.1857,
|
59 |
+
Final estimate: PPL = 7.1857 +/- 0.33585
|
60 |
+
|
61 |
+
deepseek_0628_cpu_iq1_s-00001-of-00002.gguf
|
62 |
+
model size = 58.42 GiB (2.13 BPW)
|
63 |
+
perplexity: 94.39 seconds per pass - ETA 4.72 minutes
|
64 |
+
[1]3.3257,[2]4.7059,[3]4.3868,[4]4.8870,[5]5.3162,[6]6.0753,[7]6.2931,[8]6.5085,[9]6.8913,[10]7.3148,[11]7.4235,[12]7.6295,
|
65 |
+
Final estimate: PPL = 7.6295 +/- 0.36143
|
66 |
+
|
67 |
+
deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
|
68 |
+
size: 131Gb
|
69 |
+
perplexity: 59.49 seconds per pass - ETA 2.97 minutes
|
70 |
+
[1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620,
|
71 |
+
Final estimate: PPL = 5.8620 +/- 0.26853
|
72 |
+
```
|
73 |
+
|
74 |
```bash
|
75 |
# 🏋️ For the nearly lossless Q8_0 version
|
76 |
aria2c -x 8 -o deepseek-0628-q8_0-00001-of-00006.gguf \
|