OrpoLlama-3-8B / README.md

Update README.md

35cb098 verified 7 months ago

4.2 kB

	---
	language:
	- en
	license: other
	library_name: transformers
	tags:
	- orpo
	- llama 3
	- rlhf
	- sft
	datasets:
	- mlabonne/orpo-dpo-mix-40k
	---

	# OrpoLlama-3-8B

	![](https://i.imgur.com/ZHwzQvI.png)

	This is an ORPO fine-tune of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on 1k samples of [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) created for [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3).

	It's a successful fine-tune that follows the ChatML template!

	Try the demo: https://huggingface.co/spaces/mlabonne/OrpoLlama-3-8B

	## 🔎 Application

	This model uses a context window of 8k. It was trained with the ChatML template.

	## ⚡ Quantized models

	Thanks to bartowski, solidrust, and LoneStriker for the quantized models.

	* GGUF: https://huggingface.co/bartowski/OrpoLlama-3-8B-GGUF
	* AWQ: https://huggingface.co/solidrust/OrpoLlama-3-8B-AWQ
	* EXL2:
	* https://huggingface.co/LoneStriker/OrpoLlama-3-8B-3.0bpw-h6-exl2
	* https://huggingface.co/LoneStriker/OrpoLlama-3-8B-4.0bpw-h6-exl2
	* https://huggingface.co/LoneStriker/OrpoLlama-3-8B-5.0bpw-h6-exl2
	* https://huggingface.co/LoneStriker/OrpoLlama-3-8B-6.0bpw-h6-exl2
	* https://huggingface.co/LoneStriker/OrpoLlama-3-8B-8.0bpw-h8-exl2

	## 🏆 Evaluation

	### Nous

	OrpoLlama-4-8B outperforms Llama-3-8B-Instruct on the GPT4All and TruthfulQA datasets.

	Evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), see the entire leaderboard [here](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).

	\| Model \| Average \| AGIEval \| GPT4All \| TruthfulQA \| Bigbench \|
	\| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- \| --------: \| --------: \| --------: \| ---------: \| --------: \|
	\| [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) [📄](https://gist.github.com/mlabonne/8329284d86035e6019edb11eb0933628) \| 51.34 \| 41.22 \| 69.86 \| 51.65 \| 42.64 \|
	\| [mlabonne/OrpoLlama-3-8B](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [📄](https://gist.github.com/mlabonne/22896a1ae164859931cc8f4858c97f6f) \| 48.63 \| 34.17 \| 70.59 \| 52.39 \| 37.36 \|
	\| [mlabonne/OrpoLlama-3-8B-1k](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [📄](https://gist.github.com/mlabonne/f41dad371d1781d0434a4672fd6f0b82) \| 46.76 \| 31.56 \| 70.19 \| 48.11 \| 37.17 \|
	\| [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) [📄](https://gist.github.com/mlabonne/616b6245137a9cfc4ea80e4c6e55d847) \| 45.42 \| 31.1 \| 69.95 \| 43.91 \| 36.7 \|

	`mlabonne/OrpoLlama-3-8B-1k` corresponds to a version of this model trained on 1K samples (you can see the parameters in [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3)).

	### Open LLM Leaderboard

	TBD.

	## 📈 Training curves

	You can find the experiment on W&B at [this address](https://wandb.ai/mlabonne/DPO/runs/vxnmq24z/workspace?nw=nwusermlabonne).

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/zm71HyZiG96YY1GUtpfHq.png)

	## 💻 Usage

	```python
	!pip install -qU transformers accelerate

	from transformers import AutoTokenizer
	import transformers
	import torch

	model = "mlabonne/OrpoLlama-3-8B"
	messages = [{"role": "user", "content": "What is a large language model?"}]

	tokenizer = AutoTokenizer.from_pretrained(model)
	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	pipeline = transformers.pipeline(
	"text-generation",
	model=model,
	torch_dtype=torch.float16,
	device_map="auto",
	)

	outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
	print(outputs[0]["generated_text"])
	```