Curious about the training parameters and methods.
#3
by
waytohou
- opened
Thank you for your great work. I'm curious about the training method used: was it QLora or full fine-tuning? What other parameters were used for training?
Hey,
Thanks for the nice word. The learning algorithm is proprietary (closed-source), it is a PEFT method that allows for continual learning. The loss function/framework we used is DPO, with the ultrafeedback binarized dataset. So everything is standard but the learning algo.