metadata

datasets:
  - gozfarb/ShareGPT_Vicuna_unfiltered
license: other
inference: false

VicUnlocked-30B-LoRA GGML

This is GGML format quantised 4-bit, 5-bit and 8-bit models of Neko Institute of Science's VicUnLocked 30B LoRA.

The files in this repo are the result of merging the above LoRA with the original LLaMA 30B, then converting to GGML for CPU (+ CUDA) inference using llama.cpp.

Repositories available

THESE FILES REQUIRE LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!

llama.cpp recently made a breaking change to its quantisation methods.

I have quantised the GGML files in this repo with the latest version. Therefore you will require llama.cpp compiled on May 12th or later (commit b9fd7ee or later) to use them.

Provided files

Name	Quant method	Bits	Size	RAM required	Use case
`VicUnlocked-30B-LoRA.ggml.q4_0.bin`	q4_0	4bit	20.3GB	23GB	4-bit.
`VicUnlocked-30B-LoRA.ggml.q4_1.bin`	q4_1	5bit	24.4GB	27GB	4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models.
`VicUnlocked-30B-LoRA.ggml.q5_0.bin`	q5_0	5bit	22.4GB	25GB	5-bit. Higher accuracy, higher resource usage and slower inference.
`VicUnlocked-30B-LoRA.ggml.q5_1.bin`	q5_1	5bit	24.4GB	27GB	5-bit. Even higher accuracy, and higher resource usage and slower inference.
`VicUnlocked-30B-LoRA.ggml.q8_0.bin`	q8_0	8bit	36.6GB	39GB	8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use.

How to run in `llama.cpp`

I use the following command line; adjust for your tastes and needs:

./main -t 8 -m VicUnlocked-30B-LoRA.ggml.q5_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: write a story about llamas ### Response:"

Change -t 8 to the number of physical CPU cores you have.

How to run in `text-generation-webui`

GGML models can be loaded into text-generation-webui by installing the llama.cpp module, then placing the ggml model file in a model folder as usual.

Further instructions here: text-generation-webui/docs/llama.cpp-models.md.

Original model card

Convert tools

https://github.com/practicaldreamer/vicuna_to_alpaca

Training tool

https://github.com/oobabooga/text-generation-webui

ATM I'm using 2023.05.04v0 of the dataset and training full context.

Notes:

So I will only be training 1 epoch, as full context 30b takes so long to train. This 1 epoch will take me 8 days lol but luckily these LoRA feels fully functinal at epoch 1 as shown on my 13b one. Also I will be uploading checkpoints almost everyday. I could train another epoch if there's enough want for it.

Update: Since I will not be training over 1 epoch @Aeala is training for the full 3 https://huggingface.co/Aeala/VicUnlocked-alpaca-half-30b-LoRA but it's half ctx if you care about that. Also @Aeala's just about done.

Update: Training Finished at Epoch 1, These 8 days sure felt long. I only have one A6000 lads there's only so much I can do. Also RIP gozfarb IDK what happened to him.

How to test?

Download LLaMA-30B-HF if you have not: https://huggingface.co/Neko-Institute-of-Science/LLaMA-30B-HF
Make a folder called VicUnLocked-30b-LoRA in the loras folder.
Download adapter_config.json and adapter_model.bin into VicUnLocked-30b-LoRA.
Load ooba: python server.py --listen --model LLaMA-30B-HF --load-in-8bit --chat --lora VicUnLocked-30b-LoRA
Select instruct and chose Vicuna-v1.1 template.

Training Log

https://wandb.ai/neko-science/VicUnLocked/runs/vx8yzwi7