NeoChen1024
/

dolphin-2.7-mixtral-8x7b-GGUF-IQ3_M

Inference Endpoints

Model card Files Files and versions Community

dolphin-2.7-mixtral-8x7b-GGUF-IQ3_M / README.md

NeoChen1024's picture

Update README.md

09716bf verified 2 months ago

|

264 Bytes

	---
	license: apache-2.0
	base_model:
	- cognitivecomputations/dolphin-2.7-mixtral-8x7b
	---

	GGUF IQ3_M quant of cognitivecomputations/dolphin-2.7-mixtral-8x7b (both non-imatrix and imatrix)
	It fits into 24GiB VRAM with 32768 context (@ 8bit KV cache quantization).