LavaPlanet commited on
Commit
baa8391
1 Parent(s): ac6ebd0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -48
README.md CHANGED
@@ -4,56 +4,20 @@ language:
4
  - en
5
  pipeline_tag: conversational
6
  ---
7
- # Goliath 120B
8
 
9
- An auto-regressive causal LM created by combining 2x finetuned [Llama-2 70B](https://huggingface.co/meta-llama/llama-2-70b-hf) into one.
10
 
11
- Please check out the quantized formats provided by [@TheBloke](https:///huggingface.co/TheBloke) and [@Panchovix](https://huggingface.co/Panchovix):
12
 
13
- - [GGUF](https://huggingface.co/TheBloke/goliath-120b-GGUF) (llama.cpp)
14
- - [GPTQ](https://huggingface.co/TheBloke/goliath-120b-GPTQ) (KoboldAI, TGW, Aphrodite)
15
- - [AWQ](https://huggingface.co/TheBloke/goliath-120b-AWQ) (TGW, Aphrodite, vLLM)
16
- - [Exllamav2](https://huggingface.co/Panchovix/goliath-120b-exl2) (TGW, KoboldAI)
17
-
18
- # Prompting Format
19
-
20
- Both Vicuna and Alpaca will work, but due the initial and final layers belonging primarily to Xwin, I expect Vicuna to work the best.
21
-
22
- # Merge process
23
-
24
- The models used in the merge are [Xwin](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1) and [Euryale](https://huggingface.co/Sao10K/Euryale-1.3-L2-70B).
25
-
26
- The layer ranges used are as follows:
27
 
 
28
  ```yaml
29
- - range 0, 16
30
- Xwin
31
- - range 8, 24
32
- Euryale
33
- - range 17, 32
34
- Xwin
35
- - range 25, 40
36
- Euryale
37
- - range 33, 48
38
- Xwin
39
- - range 41, 56
40
- Euryale
41
- - range 49, 64
42
- Xwin
43
- - range 57, 72
44
- Euryale
45
- - range 65, 80
46
- Xwin
47
- ```
48
-
49
- # Screenshots
50
-
51
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/635567189c72a7e742f1419c/Cat8_Rimaz6Ni7YhQiiGB.png)
52
-
53
- # Benchmarks
54
- Coming soon.
55
-
56
- # Acknowledgements
57
- Credits goes to [@chargoddard](https://huggingface.co/chargoddard) for developing the framework used to merge the model - [mergekit](https://github.com/cg123/mergekit).
58
-
59
- Special thanks to [@Undi95](https://huggingface.co/Undi95) for helping with the merge ratios.
 
4
  - en
5
  pipeline_tag: conversational
6
  ---
7
+ Another EXL2 version of AlpinDale's https://huggingface.co/alpindale/goliath-120b this one being at 2.64BPW.
8
 
9
+ [2.37BPW](https://huggingface.co/LavaPlanet/Goliath120B-exl2-2.37bpw)
10
 
11
+ Pippa llama2 Chat was used as the calibration dataset.
12
 
13
+ Can be run on two RTX 3090s w/ 24GB vram each.
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
+ Assuming Windows overhead, the following figures should be more or less close enough for estimation of your own use.
16
  ```yaml
17
+ 2.64BPW @ 4096 ctx
18
+ Empty Ctx
19
+ GPU Split:18/24
20
+ GPU1: 19.8/24
21
+ GPU2: 21.9/24
22
+ 10~ tk/s
23
+ ```