nisten commited on
Commit
5bd29cd
1 Parent(s): 6bc5aa8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -26
README.md CHANGED
@@ -4,40 +4,16 @@ base_model: [deepseek-ai/DeepSeek-V2-Chat-0628]
4
 
5
  #### 🚀 Custom quantizations of DeepSeek-V2-Chat-0628 supercharged for CPU inference of currently the #7 model globally on lmsys arena hard! 🖥️
6
 
7
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/rbdug3j6BaeTSmKLDIp39.png)
8
-
9
- ### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
10
- ### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.
11
-
12
- >[!TIP]
13
- >🔥 The following 4-bit version is my personal go-to, delivering jaw-dropping performance on ARM cores.
14
- >
15
- >📁 No need for file concatenation - just point llama-cli at the first file and watch the magic happen!
16
- >
17
- >💻 Ready to delve in baby? Here's your command-line spell for interactive mode (prompt.txt is optional, but recommended for maximum sorcery):
18
- >```bash
19
- >./llama-cli --temp 0.4 -m deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -c 32000 -co -cnv -i -f prompt.txt
20
- >```
21
-
22
- ```verilog
23
- deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
24
- deepseek_0628_cpu_optimized_iq4xm-00002-of-00004.gguf
25
- deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf
26
- deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
27
- ```
28
 
29
  >[!TIP]
30
- >### 🚄 Want to download faster than a caffeinated thirsty llama? Here's how:
31
  >
32
  >🐧 On Linux: `sudo apt install -y aria2`
 
33
  >🍎 On Mac: `brew install aria2`
34
  >
35
- ```bash
36
- sudo apt install -y aria2
37
- ```
38
 
39
  ```bash
40
- # 🚀 For the BEST PERFORMANCE / SPEED QUANT THAT I PERSONALLY USE
41
  aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \
42
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
43
 
@@ -50,6 +26,51 @@ aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf \
50
  aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \
51
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
52
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  ```bash
54
  # 🏋️ For the nearly lossless Q8_0 version
55
  aria2c -x 8 -o deepseek-0628-q8_0-00001-of-00006.gguf \
 
4
 
5
  #### 🚀 Custom quantizations of DeepSeek-V2-Chat-0628 supercharged for CPU inference of currently the #7 model globally on lmsys arena hard! 🖥️
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  >[!TIP]
9
+ >### 🚄 Just download this quant IQ4XM custom quant, it's the one I personally use:
10
  >
11
  >🐧 On Linux: `sudo apt install -y aria2`
12
+ >
13
  >🍎 On Mac: `brew install aria2`
14
  >
 
 
 
15
 
16
  ```bash
 
17
  aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \
18
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
19
 
 
26
  aria2c -x 8 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \
27
  https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
28
  ```
29
+
30
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6379683a81c1783a4a2ddba8/rbdug3j6BaeTSmKLDIp39.png)
31
+
32
+ ### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
33
+ ### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.
34
+
35
+ >[!TIP]
36
+ >
37
+ >📁 No need for file concatenation - just point llama-cli at the first file and watch the magic happen!
38
+ >
39
+ >💻 Ready to delve in baby? Here's your command-line spell for interactive mode (prompt.txt is optional, but recommended for maximum sorcery):
40
+ >```bash
41
+ >./llama-cli --temp 0.4 -m deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -c 32000 -co -cnv -i -f prompt.txt
42
+ >```
43
+
44
+ ```verilog
45
+ //PERPLEXITY BENCHMARKS,
46
+ //deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
47
+ //the 4bit iq4xm gets best perplexity but it's likely just a rounding error
48
+
49
+ ./llama-perplexity -m ~/r/deepseek_0628_cpu-iq4xm-00001-of-00002.gguf --chunks 12 -f ~/wiki.test.raw

deepseek-0628-bf16-00001-of-00011.gguf
50
+ Model size: 440 Gib
51
+ perplexity: 735.50 seconds per pass - ETA 36.77 minutes
52
+ [1]2.4827,[2]3.3887,[3]2.9470,[4]3.4768,[5]3.9012,[6]4.5128,[7]4.7533,[8]4.9550,[9]5.2863,[10]5.6824,[11]5.7541,[12]5.8734,
53
+ Final estimate: PPL = 5.8734 +/- 0.26967
54
+
55
+ deepseek_0628_cpu-iq1m-00001-of-00002.gguf
56
+ model size = 73.27 GiB (2.67 BPW)
57
+ perplexity: 96.54 seconds per pass - ETA 4.82 minutes
58
+ [1]3.4340,[2]4.5503,[3]4.0118,[4]4.5807,[5]4.9540,[6]5.7353,[7]5.9430,[8]6.1320,[9]6.5690,[10]6.9401,[11]7.0363,[12]7.1857,
59
+ Final estimate: PPL = 7.1857 +/- 0.33585
60
+
61
+ deepseek_0628_cpu_iq1_s-00001-of-00002.gguf
62
+ model size = 58.42 GiB (2.13 BPW)
63
+ perplexity: 94.39 seconds per pass - ETA 4.72 minutes
64
+ [1]3.3257,[2]4.7059,[3]4.3868,[4]4.8870,[5]5.3162,[6]6.0753,[7]6.2931,[8]6.5085,[9]6.8913,[10]7.3148,[11]7.4235,[12]7.6295,
65
+ Final estimate: PPL = 7.6295 +/- 0.36143
66
+
67
+ deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
68
+ size: 131Gb
69
+ perplexity: 59.49 seconds per pass - ETA 2.97 minutes
70
+ [1]2.4954,[2]3.3941,[3]2.9607,[4]3.4755,[5]3.8889,[6]4.5036,[7]4.7364,[8]4.9401,[9]5.2737,[10]5.6651,[11]5.7354,[12]5.8620,
71
+ Final estimate: PPL = 5.8620 +/- 0.26853
72
+ ```
73
+
74
  ```bash
75
  # 🏋️ For the nearly lossless Q8_0 version
76
  aria2c -x 8 -o deepseek-0628-q8_0-00001-of-00006.gguf \