nisten
/

Reflection-70b-PreciseQuant-6bpw-gguf

Inference Endpoints

Model card Files Files and versions Community

nisten commited on Sep 7

Commit

2a9427e

•

1 Parent(s): 0b1ab1e

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ base_model: [mattshumer/Reflection-Llama-3.1-70B]
 # This gets 99.96% perplexity at 50gb filesize whereas fp8 (not tested on this model) is known to be 97-98.8%
-Only posting one quant because it's really annoying to make these and I haven't automated it yet, takes 30+ iterations of models as I have to recompile llama.cpp every build/test step until the lowest weight configs are found.
 >🐧 To download faster on Linux `sudo apt install -y aria2`
 >🍎 On Mac `brew install aria2`

 # This gets 99.96% perplexity at 50gb filesize whereas fp8 (not tested on this model) is known to be 97-98.8%
+Only posting one quant because it's really annoying to make these and I haven't automated it yet, takes 30+ iterations of models as I have to recompile llama.cpp every build/test step until the lowest perplexity loss per weight quantization configs are found. End result is... saves 5gb of space vs regular q6_k
 >🐧 To download faster on Linux `sudo apt install -y aria2`
 >🍎 On Mac `brew install aria2`