maddes8cht
commited on
Commit
•
69e302d
1
Parent(s):
aadb36a
"Update README.md"
Browse files
README.md
CHANGED
@@ -40,19 +40,21 @@ The core project making use of the ggml library is the [llama.cpp](https://githu
|
|
40 |
|
41 |
# Quantization variants
|
42 |
|
43 |
-
There is a bunch of quantized files available.
|
44 |
|
45 |
# Legacy quants
|
46 |
|
47 |
Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
|
48 |
Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
|
49 |
-
|
|
|
|
|
50 |
|
51 |
# K-quants
|
52 |
|
53 |
-
K-quants are
|
54 |
So, if possible, use K-quants.
|
55 |
-
With a Q6_K you
|
56 |
|
57 |
|
58 |
|
|
|
40 |
|
41 |
# Quantization variants
|
42 |
|
43 |
+
There is a bunch of quantized files available to cater to your specific needs. Here's how to choose the best option for you:
|
44 |
|
45 |
# Legacy quants
|
46 |
|
47 |
Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
|
48 |
Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
|
49 |
+
## Note:
|
50 |
+
Now there's a new option to use K-quants even for previously 'incompatible' models, although this involves some fallback solution that makes them not *real* K-quants. More details can be found in affected model descriptions.
|
51 |
+
(This mainly refers to Falcon 7b and Starcoder models)
|
52 |
|
53 |
# K-quants
|
54 |
|
55 |
+
K-quants are designed with the idea that different levels of quantization in specific parts of the model can optimize performance, file size, and memory load.
|
56 |
So, if possible, use K-quants.
|
57 |
+
With a Q6_K, you'll likely find it challenging to discern a quality difference from the original model - ask your model two times the same question and you may encounter bigger quality differences.
|
58 |
|
59 |
|
60 |
|