Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ base_model: [mattshumer/Reflection-Llama-3.1-70B]
|
|
7 |
# This gets 99.96% perplexity at 50gb filesize whereas fp8 (not tested on this model) is known to be 97-98.8%
|
8 |
|
9 |
|
10 |
-
Only posting one quant because it's really annoying to make these and I haven't automated it yet, takes 30+ iterations of models as I have to recompile llama.cpp every build/test step until the lowest weight configs are found.
|
11 |
|
12 |
>π§ To download faster on Linux `sudo apt install -y aria2`
|
13 |
>π On Mac `brew install aria2`
|
|
|
7 |
# This gets 99.96% perplexity at 50gb filesize whereas fp8 (not tested on this model) is known to be 97-98.8%
|
8 |
|
9 |
|
10 |
+
Only posting one quant because it's really annoying to make these and I haven't automated it yet, takes 30+ iterations of models as I have to recompile llama.cpp every build/test step until the lowest perplexity loss per weight quantization configs are found. End result is... saves 5gb of space vs regular q6_k
|
11 |
|
12 |
>π§ To download faster on Linux `sudo apt install -y aria2`
|
13 |
>π On Mac `brew install aria2`
|