readme: add new llama.cpp release info
Browse files
README.md
CHANGED
@@ -7,21 +7,20 @@ tags:
|
|
7 |
- deepseek
|
8 |
- gguf
|
9 |
- bf16
|
10 |
-
- chinese
|
11 |
-
- english
|
12 |
metrics:
|
13 |
- accuracy
|
|
|
|
|
|
|
14 |
---
|
15 |
|
16 |
# Deepseek-V2-Chat-GGUF
|
17 |
|
18 |
Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat)
|
19 |
|
20 |
-
Using llama.cpp
|
21 |
|
22 |
-
|
23 |
-
|
24 |
-
# Warning: This will not work unless you compile llama.cpp from the repo provided (and set metadata KV overrides)!
|
25 |
|
26 |
# How to use:
|
27 |
|
@@ -79,27 +78,28 @@ quantize \
|
|
79 |
# Quants:
|
80 |
```
|
81 |
- bf16 [size: 439gb]
|
82 |
-
- q8_0
|
83 |
- q4_k_m [size: 132gb]
|
84 |
- q2_k [size: 80gb]
|
85 |
- iq2_xxs [size: 61.5gb]
|
86 |
- iq3_xs (uploading) [size: 89.6gb]
|
87 |
-
- iq1_m [size: 27.3gb]
|
|
|
88 |
```
|
89 |
|
90 |
Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed will be affected a lot.
|
91 |
|
92 |
-
# Planned Quants (
|
93 |
```
|
94 |
- q5_k_m
|
95 |
- q5_k_s
|
96 |
-
- q3_k_m
|
97 |
- q6_k
|
98 |
- iq4_nl
|
99 |
- iq4_xs
|
100 |
- iq2_xs
|
101 |
- iq2_s
|
102 |
- iq2_m
|
|
|
103 |
- iq1_s (note: for fun only, this quant is likely useless)
|
104 |
```
|
105 |
|
@@ -113,7 +113,7 @@ deepseek2.expert_shared_count=int:2
|
|
113 |
deepseek2.expert_feed_forward_length=int:1536
|
114 |
deepseek2.experts_weight_scale=int:16
|
115 |
deepseek2.leading_dense_block_count=int:1
|
116 |
-
rope.scaling.yarn_log_multiplier=float:0.0707
|
117 |
```
|
118 |
|
119 |
A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
|
|
|
7 |
- deepseek
|
8 |
- gguf
|
9 |
- bf16
|
|
|
|
|
10 |
metrics:
|
11 |
- accuracy
|
12 |
+
language:
|
13 |
+
- en
|
14 |
+
- zh
|
15 |
---
|
16 |
|
17 |
# Deepseek-V2-Chat-GGUF
|
18 |
|
19 |
Quantizised from [https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat](https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat)
|
20 |
|
21 |
+
Using llama.cpp b3026 for quantizisation
|
22 |
|
23 |
+
# Warning: This will not work unless you set metadata KV overrides, nor will it in LM Studio/similar wrapper apps!
|
|
|
|
|
24 |
|
25 |
# How to use:
|
26 |
|
|
|
78 |
# Quants:
|
79 |
```
|
80 |
- bf16 [size: 439gb]
|
81 |
+
- q8_0 [estimated size: 233.27gb]
|
82 |
- q4_k_m [size: 132gb]
|
83 |
- q2_k [size: 80gb]
|
84 |
- iq2_xxs [size: 61.5gb]
|
85 |
- iq3_xs (uploading) [size: 89.6gb]
|
86 |
+
- iq1_m (uploading) [size: 27.3gb]
|
87 |
+
- q3_k_m (uploading) [size: 92.6gb]
|
88 |
```
|
89 |
|
90 |
Note: Use iMatrix quants only if you can fully offload to GPU, otherwise speed will be affected a lot.
|
91 |
|
92 |
+
# Planned Quants (weighted/imatrix):
|
93 |
```
|
94 |
- q5_k_m
|
95 |
- q5_k_s
|
|
|
96 |
- q6_k
|
97 |
- iq4_nl
|
98 |
- iq4_xs
|
99 |
- iq2_xs
|
100 |
- iq2_s
|
101 |
- iq2_m
|
102 |
+
- iq3_xxs
|
103 |
- iq1_s (note: for fun only, this quant is likely useless)
|
104 |
```
|
105 |
|
|
|
113 |
deepseek2.expert_feed_forward_length=int:1536
|
114 |
deepseek2.experts_weight_scale=int:16
|
115 |
deepseek2.leading_dense_block_count=int:1
|
116 |
+
deepseek2.rope.scaling.yarn_log_multiplier=float:0.0707
|
117 |
```
|
118 |
|
119 |
A precompiled AVX2 version is avaliable at `llama.cpp-039896407afd40e54321d47c5063c46a52da3e01.zip` in the root of this repo.
|