metadata

tags:
  - uqff
  - mistral.rs
base_model: meta-llama/Llama-3.1-8B-Instruct
base_model_relation: quantized

`meta-llama/Llama-3.1-8B-Instruct`, UQFF quantization

Run with mistral.rs. Documentation: UQFF docs.

Flexible 🌀: Multiple quantization formats in one file format with one framework to run them all.
Reliable 🔒: Compatibility ensured with embedded and checked semantic versioning information from day 1.
Easy 🤗: Download UQFF models easily and quickly from Hugging Face, or use a local file.
Customizable 🛠️: Make and publish your own UQFF files in minutes.

Examples

Quantization type(s)	Example
FP8	`./mistralrs-server -i plain -m EricB/Llama-3.1-8B-Instruct-UQFF --from-uqff llama3.1-8b-instruct-f8e4m3.uqff`
HQQ4	`./mistralrs-server -i plain -m EricB/Llama-3.1-8B-Instruct-UQFF --from-uqff llama3.1-8b-instruct-hqq4.uqff`
HQQ8	`./mistralrs-server -i plain -m EricB/Llama-3.1-8B-Instruct-UQFF --from-uqff llama3.1-8b-instruct-hqq8.uqff`
Q3K	`./mistralrs-server -i plain -m EricB/Llama-3.1-8B-Instruct-UQFF --from-uqff llama3.1-8b-instruct-q3k.uqff`
Q4K	`./mistralrs-server -i plain -m EricB/Llama-3.1-8B-Instruct-UQFF --from-uqff llama3.1-8b-instruct-q4k.uqff`
Q5K	`./mistralrs-server -i plain -m EricB/Llama-3.1-8B-Instruct-UQFF --from-uqff llama3.1-8b-instruct-q5k.uqff`
Q8_0	`./mistralrs-server -i plain -m EricB/Llama-3.1-8B-Instruct-UQFF --from-uqff llama3.1-8b-instruct-q8_0.uqff`