LavaPlanet
/

Goliath120B-exl2-2.64bpw

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Goliath120B-exl2-2.64bpw / README.md

LavaPlanet's picture

Update README.md

baa8391 12 months ago

|

history blame contribute delete

582 Bytes

	---
	license: llama2
	language:
	- en
	pipeline_tag: conversational
	---
	Another EXL2 version of AlpinDale's https://huggingface.co/alpindale/goliath-120b this one being at 2.64BPW.

	[2.37BPW](https://huggingface.co/LavaPlanet/Goliath120B-exl2-2.37bpw)

	Pippa llama2 Chat was used as the calibration dataset.

	Can be run on two RTX 3090s w/ 24GB vram each.

	Assuming Windows overhead, the following figures should be more or less close enough for estimation of your own use.
	```yaml
	2.64BPW @ 4096 ctx
	Empty Ctx
	GPU Split:18/24
	GPU1: 19.8/24
	GPU2: 21.9/24
	10~ tk/s
	```