Curious about the training parameters and methods.

by waytohou - opened

Thank you for your great work. I'm curious about the training method used: was it QLora or full fine-tuning? What other parameters were used for training?


Thanks for the nice word. The learning algorithm is proprietary (closed-source), it is a PEFT method that allows for continual learning. The loss function/framework we used is DPO, with the ultrafeedback binarized dataset. So everything is standard but the learning algo.

Sign up or log in to comment