Update README.md
Browse files
README.md
CHANGED
@@ -5,15 +5,13 @@ base_model: [deepseek-ai/DeepSeek-V2-Chat-0628]
|
|
5 |
#### 🚀 Custom quantizations of DeepSeek-V2-Chat-0628 supercharged for CPU inference of currently the #7 model globally on lmsys arena hard! 🖥️
|
6 |
|
7 |
|
8 |
-
|
9 |
>### 🚄 Just download this IQ4XM 131Gb version, it's the one I use myself:
|
10 |
-
>
|
11 |
>🐧 On Linux: `sudo apt install -y aria2`
|
12 |
>
|
13 |
>🍎 On Mac: `brew install aria2`
|
14 |
-
>
|
15 |
|
16 |
-
```
|
17 |
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \
|
18 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
|
19 |
|
@@ -26,7 +24,12 @@ aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf \
|
|
26 |
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \
|
27 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
|
28 |
```
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
30 |
### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
|
31 |
### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.
|
32 |
|
|
|
5 |
#### 🚀 Custom quantizations of DeepSeek-V2-Chat-0628 supercharged for CPU inference of currently the #7 model globally on lmsys arena hard! 🖥️
|
6 |
|
7 |
|
8 |
+
|
9 |
>### 🚄 Just download this IQ4XM 131Gb version, it's the one I use myself:
|
|
|
10 |
>🐧 On Linux: `sudo apt install -y aria2`
|
11 |
>
|
12 |
>🍎 On Mac: `brew install aria2`
|
|
|
13 |
|
14 |
+
```verilog
|
15 |
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \
|
16 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
|
17 |
|
|
|
24 |
aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \
|
25 |
https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
|
26 |
```
|
27 |
+
>[!TIP]
|
28 |
+
>//then to have a commandline conversation interface all you need is:
|
29 |
+
```bash
|
30 |
+
git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make -j
|
31 |
+
./llama-cli -m ~/r/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -t 62 --temp 0.4 -co -cnv -i -c 3000 -p "Adopt the persona of a full-stack developer at NASA JPL."
|
32 |
+
```
|
33 |
### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
|
34 |
### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.
|
35 |
|