Edit model card

Model Card for Model ID

Phi-2-chat-v05 is a finetuned version of Phi-2 to increase the model's understanding of instructions and multi-turn conversations. In essence: it now has a concept of shutting up after an answer is given - as opposed to just switching into random generator mode.

Finetuning used 25k records from the dataset HuggingFaceH4/ultrachat_200k

Prompt format

<|system|>
You are a helpful assistant....
<|user|>
Why is the sky blue?
<|assistant|>
The sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere [...]
<|user|>
Who was the phenomenon named after?
<|assistant|>

The system generates its output after the special token <|assistant|>. You need to have that token in the input for a reliable response. Or you can use the tokenizer's chat_template, as shown below.

How to use it?

Dependencies

pip install -u torch[cuda] transformers einops

Code for inference.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "WeeRobots/phi-2-chat-v05"

model = AutoModelForCausalLM.from_pretrained(model_id, device_map={"": 0}, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False, trust_remote_code=True)

payload = tokenizer.apply_chat_template([
    { 'role': 'system', 'content': '''You are a state machine. The user will add state slot values and you'll keep track of them.''' },
    { 'role': 'user', 'content': '''Place 15 into slot apple''' },
    { 'role': 'assistant', 'content': '''Roger that.''' },
    { 'role': 'user', 'content': '''Bananas slot should be 20''' },
    { 'role': 'assistant', 'content': '''Certainly''' },
    { 'role': 'user', 'content': '''What is the value of Apple + Banana?''' },
], tokenize=False, add_generation_prompt=True,)
device = "cuda"
model_input = tokenizer(payload, return_tensors="pt").to(device)
with torch.no_grad():
  # IMPORTANT: always set the eos_token_id in this call. the model is trained to emit the eos_token the right time
  # but it might continue generating irrelevant text. this way the model will stop at the right place
  model_response = model.generate(**model_input, max_new_tokens=512, eos_token_id=tokenizer.eos_token_id, )
  print(tokenizer.decode(model_result[0], skip_special_tokens=False))

Non production quality

Be aware that this model tuning wasn't thoroughly tested, and isn't meant to be used in production, only for experimentation or hobby projects.

Downloads last month
9
Safetensors
Model size
2.78B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train WeeRobots/phi-2-chat-v05

Space using WeeRobots/phi-2-chat-v05 1