maddes8cht
/

tiiuae-falcon-7b-instruct-gguf

Inference Endpoints

Model card Files Files and versions Community

maddes8cht commited on Nov 19, 2023

Commit

5c9d704

•

1 Parent(s): 0ba5e16

"Update README.md"

Files changed (1) hide show

README.md +1 -12

README.md CHANGED Viewed

@@ -29,24 +29,13 @@ I'm constantly enhancing these model descriptions to provide you with the most r
 # K-Quants in Falcon 7b models
-New releases of Llama.cpp now support K-quantization for previously incompatible models, in particular all Falcon 7B models. This is achieved by employing a fallback solution for model layers that cannot be quantized with real K-quants.
 For Falcon 7B models, although only a quarter of the layers can be quantized with true K-quants, this approach still benefits from utilizing *different* legacy quantization types Q4_0, Q4_1, Q5_0, and Q5_1. As a result, it offers better quality at the same file size or smaller file sizes with comparable performance.
 So this solution ensures improved performance and efficiency over legacy Q4_0, Q4_1, Q5_0 and Q5_1 Quantizations.
-# Important Update for Falcon Models in llama.cpp Versions After October 18, 2023
-As previously noted on the [Llama.cpp GitHub repository](https://github.com/ggerganov/llama.cpp#hot-topics), all new Llama.cpp releases after October 18, 2023, required re-quantization due to the implementation of the new BPE tokenizer.
-This re-quantization process for Falcon Models is now complete, the latest quantized models are available here for download. To ensure continued compatibility with recent llama.cpp software, You need to update your Falcon models.
-- **Stay Informed:** Keep an eye on software application release schedules using llama.cpp libraries.
-- **Monitor Upload Times:** Re-quantization is complete. Watch for updates on my Hugging Face Model pages.
-This change only affects **Falcon** and **Starcoder** models, with other models remaining unaffected.
 ---

 # K-Quants in Falcon 7b models
+New releases of Llama.cpp now support K-quantization for previously incompatible models, in particular all Falcon 7B models (While Falcon 40b is and always has been fully compatible with K-Quantisation). This is achieved by employing a fallback solution for model layers that cannot be quantized with real K-quants.
 For Falcon 7B models, although only a quarter of the layers can be quantized with true K-quants, this approach still benefits from utilizing *different* legacy quantization types Q4_0, Q4_1, Q5_0, and Q5_1. As a result, it offers better quality at the same file size or smaller file sizes with comparable performance.
 So this solution ensures improved performance and efficiency over legacy Q4_0, Q4_1, Q5_0 and Q5_1 Quantizations.
 ---