Llama-3.1-8B-Chat

meta-llama/Meta-Llama-3.1-8B fine-tuned for chat completions.

Obligatory, this model was Built with Llama.

Quick start

Simply load the model and generate responses:

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
)


model = AutoModelForCausalLM.from_pretrained("mathewhe/Llama-3.1-8B-Chat")
tokenizer = AutoTokenizer.from_pretrained("mathewhe/Llama-3.1-8B-Chat")

messages = [
    {"role": "user", "content": "What is an LLM?"},
]

inputs = tokenizer.apply_chat_template(messages)

print(tokenizer.decode(model.generate(**inputs)[0]))

Alternatively, copy the included chat_class.py module into your local directory and just import the Chat class:

from chat_class import Chat
chat = Chat(
    "mathewhe/Llama-3.1-8B-Chat",
    device="cuda",
)

# for one-off instructions
instruction = "Write an ingredient list for banana pudding."
print(chat.instruct(instruction))

# for multi-turn chat
response1 = chat.message("Hi, please explain what DNA is.")
response2 = chat.message("Tell me more about how its discovery affected society.")

# to reset the chat
chat.reset()

Performance

We verified that this model was successfully aligned for both multi-turn dialogue and one-off instruction following.

Note that this model generates relatively short completions, leading to a low win-rate on AlpacaEval (due to the known length bias).
But it achieves a length-corrected win-rate on-par with that of Meta's 8B instruction variant (which was trained on an unreleased dataset).

Model	AlpacaEval	AlpacaEval-LC
meta-llama/Meta-Llama-3.1-8B-Instruct	21.84	20.85
mathewhe/Llama-3.1-8B-Chat	12.16	20.53

Chat template

This model uses the following chat template and does not support a separate system prompt:

<|begin_of_text|>[INST]<user-message>[/INST][ASST]<llm-response>[/ASST]<|end_of_text|>

The included tokenizer will correctly format messages, so you should not have to manually format the input text.

Instead, use the tokenizer's apply_chat_template() method on a list of messages. Each message should be a dict with two keys:

"role": Either "user" or "assistant".
"content": The message to include.

For example:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("mathewhe/Llama-3.1-8B-Chat")

messages = [
    {"role": "user", "content": "Solve for x: 3x=4"},
    {"role": "assistant", "content": "3x=4\n(3x)/3=(4)/3\nx=4/3"},
    {"role": "user", "content": "Please explain your work."},
]
print(tokenizer.apply_chat_template(messages, tokenize=False)

outputs

<|begin_of_text|>[INST]Solve for x: 3x=4[/INST][ASST]3x=4
(3x)/3=(4)/3
x=4/3[/ASST]<|end_of_text|><|begin_of_text|>[INST]Please explain your work[/INST]

See the example code in the included chat_class.py module for more details.

Data

This model was trained on the following three datsets:

HuggingFaceH4/ultrachat_200k
mathewhe/OpenHermes-2.5-Formatted (nosys configuration)
princeton-nlp/gemma2-ultrafeedback-armorm

mathewhe
/

Llama-3.1-8B-Chat

Llama-3.1-8B-Chat

Quick start

Performance

Chat template

Data

Model tree for mathewhe/Llama-3.1-8B-Chat

Datasets used to train mathewhe/Llama-3.1-8B-Chat