|
--- |
|
license: llama3 |
|
datasets: |
|
- HuggingFaceH4/ultrafeedback_binarized |
|
language: |
|
- en |
|
library_name: transformers |
|
tags: |
|
- tenyx-fine-tuning |
|
- dpo |
|
- tenyxchat |
|
- llama3 |
|
pipeline_tag: text-generation |
|
--- |
|
# TenyxChat: Language Model Alignment using Tenyx Fine-tuning |
|
|
|
Introducing Llama-3-TenyxChat-70B, part of our TenyxChat series trained to function as useful assistants through preference tuning, using Tenyx's advanced fine-tuning technology ([VentureBeat article](https://venturebeat.com/ai/tenyx-aims-to-fix-llms-catastrophic-forgetting-problem/)). Our model is trained using the [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) framework on the open-source AI feedback dataset [UltraFeedback](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized). |
|
|
|
We fine-tune [Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) with our proprietary approach |
|
which shows an increase in [MT-Bench](https://arxiv.org/abs/2306.05685)*, without a drop in performance of the model on other benchmarks. |
|
Our approach aims to mitigate forgetting in LLMs in a computationally efficient manner, |
|
thereby enabling continual fine-tuning capabilities without altering the pre-trained output distribution. |
|
Llama-3-TenyxChat-70B was trained using eight A100s (80GB) for fifteen hours, with a training setup obtained from HuggingFaceH4 ([GitHub](https://github.com/huggingface/alignment-handbook)). |
|
|
|
*The MT-Bench evaluation we perform follows the latest eval upgrade as PR'd [here](https://github.com/lm-sys/FastChat/pull/3158). This PR upgrades the evaluation from `GPT-4-0613` to `GPT-4-preview-0125` (latest version) as well as corrects and improves the quality of the reference answers for a subset of questions. These changes are required to correct the erroneous rating during previous evaluation. |
|
|
|
|
|
**Model Developers** [Tenyx Research](https://www.tenyx.com/research) |
|
|
|
|
|
# Model details |
|
|
|
- Model type: Fine-tuned 70B Instruct model for chat. |
|
- License: Meta Llama 3 Community License |
|
- Base model: [Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) |
|
- Demo: Coming Soon! |
|
|
|
## Usage |
|
|
|
Our model uses the same chat template as [Llama3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct). |
|
|
|
### Hugging face Example |
|
|
|
```python |
|
import torch |
|
from transformers import pipeline |
|
|
|
pipe = pipeline("text-generation", model="tenyx/Llama3-TenyxChat-70B", torch_dtype=torch.bfloat16, device_map="auto") |
|
|
|
messages = [ |
|
{"role": "system", "content": "You are a friendly chatbot who always responds in the style of a pirate."}, |
|
{"role": "user", "content": "Hi. I would like to make a hotel booking."}, |
|
] |
|
|
|
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
outputs = pipe(prompt, max_new_tokens=512, do_sample=False) |
|
``` |
|
|
|
|
|
# Performance |
|
|
|
At the time of release (April 2024), Llama3-TenyxChat-70B is the highest-ranked open source model on the MT-Bench evaluation available for download. |
|
|
|
## MT-Bench |
|
|
|
MT-Bench is a benchmark made up of 80 high-quality multi-turn questions. These questions fall into eight categories: Writing, Roleplay, Reasoning, Math, Coding, Extraction, STEM, and Humanities. The chat models are rated using `GPT-4-preview-0125` on a scale of 1 to 10, with higher values corresponding to better responses. |
|
|
|
| Model-name | GPT4-preview-0125 MT Bench | Chat Arena Elo | |
|
|--------------------------------|----------------------------|----------------| |
|
| GPT-4-1106 | 8.79 | 1251 | |
|
| Claude 3 Opus (20240229) | 8.57 | 1247 | |
|
| **Llama3-TenyxChat-70B** |**8.15** | NA | |
|
| *Llama3-70B-Instruct* | 7.97 | 1207 | |
|
| Claude 3 Sonnet (20240229) | 7.82 | 1190 | |
|
| GPT-4-0314 | 7.96 | 1185 | |
|
| Mixtral | 7.38 | 1114 | |
|
| gpt-3.5-turbo-0613 | 7.37 | 1113 | |
|
| Yi-34B | 6.46 | 1099 | |
|
| gpt-3.5-turbo-0125 | 7.52 | 1096 | |
|
| Llama 2 70B | 6.01 | 1082 | |
|
| NV-Llama2-70B-SteerLM-Chat | 6.57 | 1076 | |
|
|
|
![hexplot.png](hexplot_llama3-tenyxchat-70b.png) |
|
|
|
# Limitations |
|
|
|
Llama3-TenyxChat-70B, like other language models, has its own set of limitations. We haven’t fine-tuned the model explicitly to align with **human** safety preferences. Therefore, it is capable of producing undesirable outputs, particularly when adversarially prompted. From our observation, the model still tends to struggle with tasks that involve reasoning and math questions. In some instances, it might generate verbose or extraneous content. |
|
|
|
# License |
|
|
|
Llama3-TenyxChat-70B is distributed under the Meta Llama 3 Community License. |
|
|
|
# Citation |
|
|
|
If you use Llama3-TenyxChat-70B for your research, cite us as |
|
|
|
``` |
|
@misc{tenyxchat2024, |
|
title={TenyxChat: Language Model Alignment using Tenyx Fine-tuning}, |
|
author={Tenyx}, |
|
year={2024}, |
|
} |
|
``` |