Post
KTO offers an easier way to preference train LLMs (only ππ ratings are required). As part of #DataIsBetterTogether, I've written a tutorial on creating a preference dataset using Argilla and Spaces.
Using this approach, you can create a dataset that anyone with a Hugging Face account can contribute to π€―
See an example of the kind of Space you can create following this tutorial here: davanstrien/haiku-preferences
π New tutorial covers:
π¬ Generating responses with open models
π₯ Collecting human feedback (do you like this model response? Yes/No)
π€ Preparing a TRL-compatible dataset for training aligned models
Check it out here: https://github.com/huggingface/data-is-better-together/tree/main/kto-preference
Using this approach, you can create a dataset that anyone with a Hugging Face account can contribute to π€―
See an example of the kind of Space you can create following this tutorial here: davanstrien/haiku-preferences
π New tutorial covers:
π¬ Generating responses with open models
π₯ Collecting human feedback (do you like this model response? Yes/No)
π€ Preparing a TRL-compatible dataset for training aligned models
Check it out here: https://github.com/huggingface/data-is-better-together/tree/main/kto-preference