tincans-ai
/

gazelle-v0.2-dpo

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

hingeloss commited on Mar 19

Commit

63c6be8

•

1 Parent(s): d861746

Create README.md

Files changed (1) hide show

README.md +14 -0

README.md ADDED Viewed

	@@ -0,0 +1,14 @@

+---
+license: apache-2.0
+language:
+- en
+---
+Gazelle v0.2 is the mid-March release from [Tincans](https://tincans.ai) of a joint speech-language model.
+This repo contains an experimental DPO finetune. To our knowledge, this is the first multi-modal DPO finetune of a speech-language model - audio in, text out.
+The datasets used were [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset) and [jondurbin/truthy-dpo-v0.1](https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1?row=0). We trained for 2 epochs with lr=3e-4, batch size 32, 10 warmup steps, cosine decay.
+We can see some tell-tale signs of preference modeling at play, particularly longer replies, which don't exist in the base instruction-tuned model. Overall, we view the quality as being mixed and welcome experimentation but do not suggest production use.
+Please see [this notebook](https://github.com/tincans-ai/gazelle/blob/2939d7034277506171d61a7a1001f535426faa71/examples/infer.ipynb) for an inference example.