Text2Text Generation
Transformers
Safetensors
English
gazelle
Inference Endpoints
Edit model card

Gazelle v0.2 is the mid-March release from Tincans of a joint speech-language model.

This repo contains an experimental DPO finetune. To our knowledge, this is the first multi-modal DPO finetune of a speech-language model - audio in, text out.

The datasets used were snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset (first iteration) and jondurbin/truthy-dpo-v0.1. We trained for 2 epochs with max_lr=3e-4, batch size 32, 10 warmup steps, cosine decay.

We can see some tell-tale signs of preference modeling at play, particularly longer replies, which don't exist in the base instruction-tuned model. Overall, we view the quality as being mixed and welcome experimentation but do not suggest production use.

Please see this notebook for an inference example.

Downloads last month
4
Safetensors
Model size
7.37B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Datasets used to train tincans-ai/gazelle-v0.2-dpo

Collection including tincans-ai/gazelle-v0.2-dpo