|
``` |
|
e88 88e d8 |
|
d888 888b 8888 8888 ,"Y88b 888 8e d88 |
|
C8888 8888D 8888 8888 "8" 888 888 88b d88888 |
|
Y888 888P Y888 888P ,ee 888 888 888 888 |
|
"88 88" "88 88" "88 888 888 888 888 |
|
b |
|
8b, |
|
|
|
e88'Y88 d8 888 |
|
d888 'Y ,"Y88b 888,8, d88 ,e e, 888 |
|
C8888 "8" 888 888 " d88888 d88 88b 888 |
|
Y888 ,d ,ee 888 888 888 888 , 888 |
|
"88,d88 "88 888 888 888 "YeeP" 888 |
|
|
|
PROUDLY PRESENTS |
|
``` |
|
# Llama-3-70B-Instruct-Storywriter-exl2-rpcal |
|
Quantized using 200 samples of 8192 tokens from an RP-oriented [PIPPA](https://huggingface.co/datasets/royallab/PIPPA-cleaned) dataset. |
|
|
|
Branches: |
|
- `main` -- `measurement.json` |
|
- `2.25b6h` -- 2.25bpw, 6bit lm_head |
|
- `3.5b6h` -- 3.5bpw, 6bit lm_head |
|
- `3.75b6h` -- 3.75bpw, 6bit lm_head |
|
- `4.5b6h` -- 4.5bpw, 6bit lm_head |
|
- `4.65b6h` -- 4.65bpw, 6bit lm_head |
|
- `6b6h` -- 6bpw, 6bit lm_head |
|
- `8b8h` -- 8bpw, 8bit lm_head |
|
|
|
Original model link: [tdrussell/Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter) |
|
|
|
Original model README below. |
|
|
|
----- |
|
|
|
# Llama 3 70B Instruct Storywriter |
|
Llama 3 70B Instruct, further finetuned on a dataset consisting of books in the fiction genre. |
|
|
|
This was just an experiment, but it turned out well enough that I'm sharing it. The finetuning has caused a significant shift in the model's writing style, and seems to have made it more creative. There may be a slight decrease in overall intelligence. |
|
|
|
Because this was trained on Instruct, you can use the normal Instruct chat formatting. It may also work well in raw completion mode. |
|
|
|
## Training details |
|
Trained on 4 4090s using [qlora-pipe](https://github.com/tdrussell/qlora-pipe). |
|
Dataset consists of about 800 books in the fiction genre, totaling 570 MB of raw text. |
|
Rank 64 QLoRA trained at 8192 sequence length. |
|
### Evaluation metrics |
|
|
|
<img src="https://i.imgur.com/sCMjix4.png" width="800" /> |
|
|
|
## Why no 8B? |
|
I tried multiple times to train this on Llama 3 8B Instruct, using a variety of hyperparameters. It never worked well. The model took a huge hit to intelligence every time, to the point of being unusable. 70B fared much better. I don't know why, maybe 8B is just too small for this type of technique, and loses too much of the instruction-tuned smarts. |