nisten
/

deepseek-0628-gguf

Inference Endpoints

Model card Files Files and versions Community

nisten commited on Jul 19

Commit

124e2e1

•

1 Parent(s): 2498189

Update README.md

Files changed (1) hide show

README.md +8 -5

README.md CHANGED Viewed

@@ -5,15 +5,13 @@ base_model: [deepseek-ai/DeepSeek-V2-Chat-0628]
 #### 🚀 Custom quantizations of DeepSeek-V2-Chat-0628 supercharged for CPU inference of currently the #7 model globally on lmsys arena hard! 🖥️
->[!TIP]
 >### 🚄 Just download this IQ4XM 131Gb version, it's the one I use myself:
->
 >🐧 On Linux: `sudo apt install -y aria2`
 >
 >🍎 On Mac: `brew install aria2`
->
-```bash
 aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
@@ -26,7 +24,12 @@ aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00003-of-00004.gguf \
 aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
 ```
 ### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
 ### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.

 #### 🚀 Custom quantizations of DeepSeek-V2-Chat-0628 supercharged for CPU inference of currently the #7 model globally on lmsys arena hard! 🖥️
 >### 🚄 Just download this IQ4XM 131Gb version, it's the one I use myself:
 >🐧 On Linux: `sudo apt install -y aria2`
 >
 >🍎 On Mac: `brew install aria2`
+```verilog
 aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf
 aria2c -x 9 -o deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf \
   https://huggingface.co/nisten/deepseek-0628-gguf/resolve/main/deepseek_0628_cpu_optimized_iq4xm-00004-of-00004.gguf
 ```
+>[!TIP]
+>//then to have a commandline conversation interface all you need is:
+```bash
+git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && make -j
+./llama-cli -m ~/r/deepseek_0628_cpu_optimized_iq4xm-00001-of-00004.gguf -t 62 --temp 0.4 -co -cnv -i -c 3000 -p "Adopt the persona of a full-stack developer at NASA JPL."
+```
 ### 🧠 This IQ4XM version uses GGML TYPE IQ_4_XS 4bit in combination with q8_0 bit for blazing fast performance with minimal loss, leveraging int8 optimizations on most newer server CPUs.
 ### 🛠️ While it required some custom code wizardry, it's fully compatible with standard llama.cpp from GitHub or just search for nisten in lmstudio.