metadata
license: apache-2.0
language:
- en
datasets:
- Intel/orca_dpo_pairs
pipeline_tag: conversational
library_name: peft
tags:
- llm
- 7b
Jaskier 7b DPO V2
This is work-in-progress model, may not be ready for production use
Model based on mindy-labs/mindy-7b-v2
(downstream version of Mistral7B) finetuned using Direct Preference Optimization on Intel/orca_dpo_pairs.
How to use
You can use this model directly with a pipeline for sentiment-analysis:
from transformers import StoppingCriteria, StoppingCriteriaList, pipeline
class StopOnTokens(StoppingCriteria):
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
stop_ids = torch.tensor([28789, 28766,321,28730, 416, 28766, 28767]).to(input_ids.device)
if len(input_ids[0]) < len(stop_ids):
return False
if torch.equal(input_ids[0][-len(stop_ids):], stop_ids):
return True
return False
model_name = "bardsai/jaskier-7b-dpo-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
pipeline = pipeline(
"text-generation",
model=model_name,
tokenizer=tokenizer,
device="cuda:0"
)
messages = [
{"role": "system", "content": "Your task is to extract country names from the text provided by user. Return in comma-separated format."},
{"role": "user", "content": "Germany,[e] officially the Federal Republic of Germany,[f] is a country in the western region of Central Europe. It is the second-most populous country in Europe after Russia,[g] and the most populous member state of the European Union. Germany lies between the Baltic and North Sea to the north and the Alps to the south. Its 16 constituent states have a total population of over 84 million, cover a combined area of 357,600 km2 (138,100 sq mi) and are bordered by Denmark to the north, Poland and the Czech Republic to the east, Austria and Switzerland to the south, and France, Luxembourg, Belgium, and the Netherlands to the west. The nation's capital and most populous city is Berlin and its main financial centre is Frankfurt; the largest urban area is the Ruhr."}
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
# Generate text
sequences = pipeline(
prompt,
do_sample=True,
temperature=0.7,
top_p=0.9,
num_return_sequences=1,
max_length=300,
stopping_criteria=StoppingCriteriaList([StopOnTokens()])
)
print(sequences[0])
Output
Germany,Denmark,Poland,Czech Republic,Austria,Switzerland,France,Luxembourg,Belgium,Netherlands
Changelog
- 2023-01-10: Initial release
About bards.ai
At bards.ai, we focus on providing machine learning expertise and skills to our partners, particularly in the areas of nlp, machine vision and time series analysis. Our team is located in Wroclaw, Poland. Please visit our website for more information: bards.ai
Let us know if you use our model :). Also, if you need any help, feel free to contact us at [email protected]