@davanstrien on Hugging Face: "KTO offers an easier way to preference train LLMs (only 👍👎 ratings are…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

davanstrien

posted an update Mar 15

Post

KTO offers an easier way to preference train LLMs (only 👍👎 ratings are required). As part of #DataIsBetterTogether, I've written a tutorial on creating a preference dataset using Argilla and Spaces.

Using this approach, you can create a dataset that anyone with a Hugging Face account can contribute to 🤯

See an example of the kind of Space you can create following this tutorial here: davanstrien/haiku-preferences

🆕 New tutorial covers:
💬 Generating responses with open models
👥 Collecting human feedback (do you like this model response? Yes/No)
🤖 Preparing a TRL-compatible dataset for training aligned models

Check it out here: https://github.com/huggingface/data-is-better-together/tree/main/kto-preference

dball

Mar 25

I see

The current notebooks and code currently only show how to generate the synthetic data and create a preference dataset annotation Space. The next steps would be to collect human feedback on the synthetic data and then use this to train a model. We will cover this in a future notebook.

Is there a future notebook with this content already?

davanstrien

Mar 25

Hopefully I'll have something to share for this soon! I still need to do some more annotating!

In this post