LavaPlanet's picture
Update README.md
baa8391
---
license: llama2
language:
- en
pipeline_tag: conversational
---
Another EXL2 version of AlpinDale's https://huggingface.co/alpindale/goliath-120b this one being at 2.64BPW.
[2.37BPW](https://huggingface.co/LavaPlanet/Goliath120B-exl2-2.37bpw)
Pippa llama2 Chat was used as the calibration dataset.
Can be run on two RTX 3090s w/ 24GB vram each.
Assuming Windows overhead, the following figures should be more or less close enough for estimation of your own use.
```yaml
2.64BPW @ 4096 ctx
Empty Ctx
GPU Split:18/24
GPU1: 19.8/24
GPU2: 21.9/24
10~ tk/s
```