training detail?

#8
by max1321 - opened

This model is amazing! I love it! thank you for your hard work!!

I have two questions:

  1. You said this model is based on the llama3 fine tuning, which python script are you using? and could you tell us how long you've been training?
  2. Is it possible to train again based on current model ?

Thanks for your interest in the model @max1321 !

Although I don't keep the exact script I used to fine-tune it, I remember the details: I use a LoRA adapter of rank 16 from the peft library, and 1 epoch of training over this dataset: https://huggingface.co/datasets/ResplendentAI/NSFW_RP_Format_DPO, using the DPOTrainer from the trl library. All the other hyperparameters are the default ones.
Here is a similar fine-tuning script of mine using the previous libraries, though for other dataset/model: https://github.com/vicgalle/configurable-safety-tuning/blob/main/cst_train.py

And yes, I should be possible to continue fine-tuning this model as a base, as the overall performance in general tasks hasn't degraded

Btw, just in case, I've released a similar model, trained over more data and using the newer Llama-3.1 as the base: vicgalle/Humanish-Roleplay-Llama-3.1-8B

Sign up or log in to comment