Lewdiculous
/

SOVL_Llama3_8B-GGUF-IQ-Imatrix

Inference Endpoints

Model card Files Files and versions Community

SOVL_Llama3_8B-GGUF-IQ-Imatrix / README.md

Lewdiculous's picture

Update README.md

ab236f3 verified 7 months ago

|

1.05 kB

	---
	license: apache-2.0
	---
	# #llama-3 #experimental #work-in-progress

	GGUF-IQ-Imatrix quants for @jeiku's [ResplendentAI/SOVL_Llama3_8B](https://huggingface.co/ResplendentAI/SOVL_Llama3_8B). <br> Give them some love!

	> [!IMPORTANT]
	> Updated!
	> These quants have been redone with the fixes from [llama.cpp/pull/6920](https://github.com/ggerganov/llama.cpp/pull/6920) in mind.

	> [!NOTE]
	> Well...! <br>
	> Turns out it was not just a hallucination and this model actually is pretty cool so give it a chance! <br>
	> For 8GB VRAM GPUs, I recommend the Q4_K_M-imat quant for up to 12288 context sizes.

	> [!WARNING]
	> Compatible SillyTavern presets [here (simple)](https://huggingface.co/Lewdiculous/Model-Requests/tree/main/data/presets/cope-llama-3-0.1) or [here (Virt's)](https://huggingface.co/Virt-io/SillyTavern-Presets). <br>
	> Use the latest version of KoboldCpp. Use the provided presets. <br>

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/626dfb8786671a29c715f8a9/N_1D87adbMuMlSIQ5rI3_.png)