Edit model card

Llamacpp Quantizations of Llama-3.1-Herrsimian-8B

Using llama.cpp release b3703 for quantization.

Original model: https://huggingface.co/lemonilia/Llama-3.1-Herrsimian-8B

Quant Types:

Filename Quant type File Size Required VRAM at 32k ctx
Llama-3.1-Herrsimian-8B-F16.gguf F16 14.9GB 18.6GB
Llama-3.1-Herrsimian-8B-Q8_0.gguf Q8_0 7.95GB 14.0GB
Llama-3.1-Herrsimian-8B-Q6_K.gguf Q6_K 6.14GB 12.2GB
Llama-3.1-Herrsimian-8B-Q5_K_M.gguf Q5_K_M 5.33GB 11.4GB
Llama-3.1-Herrsimian-8B-Q5_K_S.gguf Q5_K_S 5.21GB 11.3GB
Llama-3.1-Herrsimian-8B-Q4_K_M.gguf Q4_K_M 4.58GB 10.6GB
Llama-3.1-Herrsimian-8B-Q4_K_S.gguf Q4_K_S 4.37GB 10.4GB
Llama-3.1-Herrsimian-8B-Q3_K_L.gguf Q3_K_L 4.02GB 10.1GB
Llama-3.1-Herrsimian-8B-Q3_K_M.gguf Q3_K_M 3.74GB 9.7GB
Llama-3.1-Herrsimian-8B-Q3_K_S.gguf Q3_K_S 3.41GB 9.4GB
Llama-3.1-Herrsimian-8B-Q2_K.gguf Q2_K 2.95GB 9.2GB
Downloads last month
297
GGUF
Model size
8.03B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for knifeayumu/Llama-3.1-Herrsimian-8B-GGUF

Quantized
(3)
this model

Collection including knifeayumu/Llama-3.1-Herrsimian-8B-GGUF