royallab
/

LimaRP-ShareGPT-13b-qloras

Not-For-All-Audiences

Model card Files Files and versions Community

LimaRP-ShareGPT-13b-qloras / README.md

kingbri's picture

Update README.md

3918a9c about 1 year ago

|

history blame contribute delete

3.19 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- not-for-all-audiences
	---

	# LimaRP-ShareGPT-13b-loras

	This is a repository of my Llama-2-13b Qlora checkpoints based from the LimaRP dataset converted to ShareGPT.

	## Disclaimer

	This a highly experimental QLora test. If you want to use the LimaRP lora, please [look here instead](https://huggingface.co/lemonilia/limarp-llama2-v2). Lemonilia's Lora uses the Alpaca format.

	## Why?

	LimaRP is a high-quality lora with a dataset of human RP examples. However, it can come on too strong and nuke the personality of a character with a weight of 1.0. Therefore, lower weights are required when merging models. We wanted to see what would when formatting the dataset using shareGPT, a format that supports turn-based conversations instead of Alpaca which requires newline hackery.

	In addition, we wanted to see how various system prompts affect the end result of a lora finetune along with the use of character names as roles rather than the standard `USER` and `ASSISTANT`.

	## Roles
	- kingbri: Rewrote dataset creation script, trained all loras, reformatted dataset.
	- Alicat: Provided insight on system prompts and dataset formatting guidelines.
	- Argo: Provided insight on system prompts.

	## Variance

	For system prompts, please see the appropriate folder READMEs.

	One char = Character's persona in system prompt.
	Two char = Character and User's persona in system prompt.
	The scenario is always included.

	In addition, the dataprepare script randomizes a dataset before exporting it. This means that different portions were used for eval during training. Therefore each lora used a randomized version of LimaRP's 4k dataset. The randomization of what entries go to eval should not affect the results.

	## Notes

	These Qloras were produced as an experiment to see how varying versions of LimaRP can affect a model. Please take this data with a grain of salt. You are able to test these yourself and decide from there.

	### Architecture

	- Model Architecture: Llama-2-13b
	- Training Algorithm: QLora

	### Training Details

	- Dataset: LimaRP formatted with this [script](https://gist.github.com/bdashore3/4c9f3a812c1a68013fdb23e1179c7765)
	- Datset type: ShareGPT
	- Training Parameters: [See Here](https://gist.github.com/bdashore3/ab6cd21777a30fb9b131bc7b2f6b8949)
	- Training Environment: Axolotl
	- sequence_len: 4096

	## Instruct Format

	ShareGPT gets converted to vicuna format. The roles were character names when training, so there's no set role of `USER` or `ASSISTANT`. You can probably use the following and it should work:

	```
	SYSTEM: Enter roleplay mode...
	User: {prompt}
	Character:
	```

	Not using instruct mode (preferred) is also an option.

	## Acknowledgments

	Thanks to:
	- Lemonilia: Original creator of the LimaRP dataset and Lora
	- Axolotl: Finetuning suite

	## Donate?
	All my infrastructure and cloud expenses are paid out of pocket. If you'd like to donate, you can do so here: [https://ko-fi.com/kingbri](https://ko-fi.com/kingbri)

	You should not feel obligated to donate, but if you do, I'd appreciate it.

	## Axolotl stuff

	All axolotl stuff is located within each lora folder