File size: 538 Bytes
de46745
 
 
4176785
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
license: llama2
---
Another EXL2 version of AlpinDale's https://huggingface.co/alpindale/goliath-120b this one being at 2.64BPW.

[2.37BPW](https://huggingface.co/LavaPlanet/Goliath120B-exl2-2.37bpw)

Pippa llama2 Chat was used as the calibration dataset.

Can be run on two RTX 3090s w/ 24GB vram each.

Assuming Windows overhead, the following figures should be more or less close enough for estimation of your own use. 
```yaml
2.64BPW @ 4096 ctx
  Empty Ctx
    GPU Split:18/24
    GPU1: 19.8/24
    GPU2: 21.9/24
    10~ tk/s
```