Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
```
|
2 |
+
e88 88e d8
|
3 |
+
d888 888b 8888 8888 ,"Y88b 888 8e d88
|
4 |
+
C8888 8888D 8888 8888 "8" 888 888 88b d88888
|
5 |
+
Y888 888P Y888 888P ,ee 888 888 888 888
|
6 |
+
"88 88" "88 88" "88 888 888 888 888
|
7 |
+
b
|
8 |
+
8b,
|
9 |
+
|
10 |
+
e88'Y88 d8 888
|
11 |
+
d888 'Y ,"Y88b 888,8, d88 ,e e, 888
|
12 |
+
C8888 "8" 888 888 " d88888 d88 88b 888
|
13 |
+
Y888 ,d ,ee 888 888 888 888 , 888
|
14 |
+
"88,d88 "88 888 888 888 "YeeP" 888
|
15 |
+
|
16 |
+
PROUDLY PRESENTS
|
17 |
+
```
|
18 |
+
# Llama-3-70B-Instruct-Storywriter-exl2-rpcal
|
19 |
+
Quantized using 200 samples of 8192 tokens from an RP-oriented [PIPPA](https://huggingface.co/datasets/royallab/PIPPA-cleaned) dataset.
|
20 |
+
|
21 |
+
Branches:
|
22 |
+
- `main` -- `measurement.json`
|
23 |
+
- `2.25b6h` -- 2.25bpw, 6bit lm_head
|
24 |
+
- `3.5b6h` -- 3.5bpw, 6bit lm_head
|
25 |
+
- `3.75b6h` -- 3.75bpw, 6bit lm_head
|
26 |
+
- `4.5b6h` -- 4.5bpw, 6bit lm_head
|
27 |
+
- `4.65b6h` -- 4.65bpw, 6bit lm_head
|
28 |
+
- `6b6h` -- 6bpw, 6bit lm_head
|
29 |
+
- `8b8h` -- 8bpw, 8bit lm_head
|
30 |
+
|
31 |
+
Original model link: [tdrussell/Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter)
|
32 |
+
|
33 |
+
Original model README below.
|
34 |
+
|
35 |
+
-----
|
36 |
+
|
37 |
+
# Llama 3 70B Instruct Storywriter
|
38 |
+
Llama 3 70B Instruct, further finetuned on a dataset consisting of books in the fiction genre.
|
39 |
+
|
40 |
+
This was just an experiment, but it turned out well enough that I'm sharing it. The finetuning has caused a significant shift in the model's writing style, and seems to have made it more creative. There may be a slight decrease in overall intelligence.
|
41 |
+
|
42 |
+
Because this was trained on Instruct, you can use the normal Instruct chat formatting. It may also work well in raw completion mode.
|
43 |
+
|
44 |
+
## Training details
|
45 |
+
Trained on 4 4090s using [qlora-pipe](https://github.com/tdrussell/qlora-pipe).
|
46 |
+
Dataset consists of about 800 books in the fiction genre, totaling 570 MB of raw text.
|
47 |
+
Rank 64 QLoRA trained at 8192 sequence length.
|
48 |
+
### Evaluation metrics
|
49 |
+
|
50 |
+
<img src="https://i.imgur.com/sCMjix4.png" width="800" />
|
51 |
+
|
52 |
+
## Why no 8B?
|
53 |
+
I tried multiple times to train this on Llama 3 8B Instruct, using a variety of hyperparameters. It never worked well. The model took a huge hit to intelligence every time, to the point of being unusable. 70B fared much better. I don't know why, maybe 8B is just too small for this type of technique, and loses too much of the instruction-tuned smarts.
|