swap-uniba
/

LLaMAntino-3-ANITA-8B-Inst-DPO-ITA

@@ -24,12 +24,6 @@ license: llama3
 <hr>
 <!--<img src="https://i.ibb.co/6mHSRm3/llamantino53.jpg" width="200"/>-->
-## Model Details
-*Last Update: 29/04/2024*<br>
-*GitHub Link* → [https://github.com/marcopoli/LLaMAntino-3-ANITA](https://github.com/marcopoli/LLaMAntino-3-ANITA)<br>
-<hr>
 **LLaMAntino-3-ANITA-8B-sft-DPO** is a model of the [**LLaMAntino**](https://huggingface.co/swap-uniba) - *Large Language Models family*.
 The model is an instruction-tuned version of [**Meta-Llama-3-8b-instruct**](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (a fine-tuned **LLaMA 3 model**).
 This model version aims to be the **Multilingual Base-Model** 🏁 to further fine-tune in the Italian environment.
@@ -40,14 +34,22 @@ wants to provide Italian NLP researchers with an improved model the for Italian
 <hr>
 ## Specifications
-- **Model developers**: Ph.D. Marco Polignano - University of Bari Aldo Moro, Italy
-- **Variations**: The model release has been **supervised fine-tuning (SFT)** using **QLoRA**, on a long list of instruction-based datasets. **DPO** approach over the *HuggingFaceH4/ultrafeedback_binarized* dataset is used to align with human preferences for helpfulness and safety.
 - **Input**: Models input text only.
 - **Output**: Models generate text and code only.
 - **Model Architecture**: *Llama 3 architecture*.
 - **Context length**: 8K, 8192.
 <hr>
 ## Playground
@@ -74,7 +76,7 @@ For direct use with `transformers`, you can easily get started with the followin
       AutoTokenizer,
   )
-  base_model = "m-polignano-uniba/LLaMAntino-3-ANITA-8B-sft-DPO"
   model = AutoModelForCausalLM.from_pretrained(
       base_model,
       torch_dtype=torch.bfloat16,
@@ -83,8 +85,10 @@ For direct use with `transformers`, you can easily get started with the followin
   tokenizer = AutoTokenizer.from_pretrained(base_model)
   messages = [
-      {"role": "system", "content": "Answer clearly and detailed."},
-      {"role": "user", "content": "Why is the sky blue ?"}
   ]
   #Method 1
@@ -92,7 +96,7 @@ For direct use with `transformers`, you can easily get started with the followin
   inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
   for k,v in inputs.items():
       inputs[k] = v.cuda()
-  outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.85, temperature=0.7)
   results = tokenizer.batch_decode(outputs)[0]
   print(results)
@@ -104,9 +108,9 @@ For direct use with `transformers`, you can easily get started with the followin
       return_full_text=False, # langchain expects the full text
       task='text-generation',
       max_new_tokens=512, # max number of tokens to generate in the output
-      temperature=0.7,  #temperature for more or less creative answers
       do_sample=True,
-      top_p=0.85,
   )
   sequences = pipe(messages)
@@ -125,7 +129,7 @@ For direct use with `transformers`, you can easily get started with the followin
       BitsAndBytesConfig,
   )
-  base_model = "m-polignano-uniba/LLaMAntino-3-ANITA-8B-sft-DPO"
   bnb_config = BitsAndBytesConfig(
       load_in_4bit=True,
       bnb_4bit_quant_type="nf4",
@@ -140,8 +144,10 @@ For direct use with `transformers`, you can easily get started with the followin
   tokenizer = AutoTokenizer.from_pretrained(base_model)
   messages = [
-      {"role": "system", "content": "Answer clearly and detailed."},
-      {"role": "user", "content": "Why is the sky blue ?"}
   ]
   #Method 1
@@ -149,7 +155,7 @@ For direct use with `transformers`, you can easily get started with the followin
   inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
   for k,v in inputs.items():
       inputs[k] = v.cuda()
-  outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.85, temperature=0.7)
   results = tokenizer.batch_decode(outputs)[0]
   print(results)
@@ -161,9 +167,9 @@ For direct use with `transformers`, you can easily get started with the followin
       return_full_text=False, # langchain expects the full text
       task='text-generation',
       max_new_tokens=512, # max number of tokens to generate in the output
-      temperature=0.7,  #temperature for more or less creative answers
       do_sample=True,
-      top_p=0.85,
   )
   sequences = pipe(messages)
@@ -187,7 +193,7 @@ For direct use with `unsloth`, you can easily get started with the following ste
   from unsloth import FastLanguageModel
   import torch
-  base_model = "m-polignano-uniba/LLaMAntino-3-ANITA-8B-sft-DPO"
   model, tokenizer = FastLanguageModel.from_pretrained(
       model_name = base_model,
       max_seq_length = 8192,
@@ -200,14 +206,16 @@ For direct use with `unsloth`, you can easily get started with the following ste
 - Right now, you can start using the model directly.
   ```python
   messages = [
-      {"role": "system", "content": "Answer clearly and detailed."},
-      {"role": "user", "content": "Why is the sky blue ?"}
   ]
   prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
   inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
   for k,v in inputs.items():
       inputs[k] = v.cuda()
-  outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.85, temperature=0.7)
   results = tokenizer.batch_decode(outputs)[0]
   print(results)
   ```

 <hr>
 <!--<img src="https://i.ibb.co/6mHSRm3/llamantino53.jpg" width="200"/>-->
 **LLaMAntino-3-ANITA-8B-sft-DPO** is a model of the [**LLaMAntino**](https://huggingface.co/swap-uniba) - *Large Language Models family*.
 The model is an instruction-tuned version of [**Meta-Llama-3-8b-instruct**](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (a fine-tuned **LLaMA 3 model**).
 This model version aims to be the **Multilingual Base-Model** 🏁 to further fine-tune in the Italian environment.
 <hr>
+## Model Details
+*Last Update: 10/05/2024*<br>
+<img src="https://static.vecteezy.com/system/resources/previews/016/833/880/large_2x/github-logo-git-hub-icon-with-text-on-white-background-free-vector.jpg" width="200"> [https://github.com/marcopoli/LLaMAntino-3-ANITA](https://github.com/marcopoli/LLaMAntino-3-ANITA)<br>
+<hr>
 ## Specifications
+- **Model developers**: Ph.D. Marco Polignano - University of Bari Aldo Moro, Italy - SWAP Research Group
+- **Variations**: The model release has been **supervised fine-tuning (SFT)** using **QLoRA** 4bit, on two instruction-based datasets. **DPO** approach over the *jondurbin/truthy-dpo-v0.1* dataset is used to align with human preferences for helpfulness and safety.
 - **Input**: Models input text only.
 - **Output**: Models generate text and code only.
 - **Model Architecture**: *Llama 3 architecture*.
 - **Context length**: 8K, 8192.
+- **Library Used**: [Unsloth](https://unsloth.ai/)
 <hr>
 ## Playground
       AutoTokenizer,
   )
+  base_model = "m-polignano-uniba/LLaMAntino-3-ANITA-8B-Instr-DPO-ITA"
   model = AutoModelForCausalLM.from_pretrained(
       base_model,
       torch_dtype=torch.bfloat16,
   tokenizer = AutoTokenizer.from_pretrained(base_model)
   messages = [
+      {"role": "system", "content": {"role": "system", "content": "Sei un an assistente AI per la lingua Italiana di nome LLaMAntino-3 ANITA \
+      (Advanced Natural-based interaction for the ITAlian language). \
+      Rispondi nella lingua usata per la domanda in modo chiaro, semplice ed esaustivo. "},
+      {"role": "user", "content": "Why is the sky blue?"}
   ]
   #Method 1
   inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
   for k,v in inputs.items():
       inputs[k] = v.cuda()
+  outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6)
   results = tokenizer.batch_decode(outputs)[0]
   print(results)
       return_full_text=False, # langchain expects the full text
       task='text-generation',
       max_new_tokens=512, # max number of tokens to generate in the output
+      temperature=0.6,  #temperature for more or less creative answers
       do_sample=True,
+      top_p=0.9,
   )
   sequences = pipe(messages)
       BitsAndBytesConfig,
   )
+  base_model = "m-polignano-uniba/LLaMAntino-3-ANITA-8B-Instr-DPO-ITA"
   bnb_config = BitsAndBytesConfig(
       load_in_4bit=True,
       bnb_4bit_quant_type="nf4",
   tokenizer = AutoTokenizer.from_pretrained(base_model)
   messages = [
+     {"role": "system", "content": {"role": "system", "content": "Sei un an assistente AI per la lingua Italiana di nome LLaMAntino-3 ANITA \
+     (Advanced Natural-based interaction for the ITAlian language). \
+     Rispondi nella lingua usata per la domanda in modo chiaro, semplice ed esaustivo. "},
+      {"role": "user", "content": "Why is the sky blue?"}
   ]
   #Method 1
   inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
   for k,v in inputs.items():
       inputs[k] = v.cuda()
+  outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6)
   results = tokenizer.batch_decode(outputs)[0]
   print(results)
       return_full_text=False, # langchain expects the full text
       task='text-generation',
       max_new_tokens=512, # max number of tokens to generate in the output
+      temperature=0.6,  #temperature for more or less creative answers
       do_sample=True,
+      top_p=0.9,
   )
   sequences = pipe(messages)
   from unsloth import FastLanguageModel
   import torch
+  base_model = "m-polignano-uniba/LLaMAntino-3-ANITA-8B-Instr-DPO-ITA"
   model, tokenizer = FastLanguageModel.from_pretrained(
       model_name = base_model,
       max_seq_length = 8192,
 - Right now, you can start using the model directly.
   ```python
   messages = [
+      {"role": "system", "content": {"role": "system", "content": "Sei un an assistente AI per la lingua Italiana di nome LLaMAntino-3 ANITA \
+     (Advanced Natural-based interaction for the ITAlian language). \
+     Rispondi nella lingua usata per la domanda in modo chiaro, semplice ed esaustivo. "},
+      {"role": "user", "content": "Why is the sky blue?"}
   ]
   prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
   inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
   for k,v in inputs.items():
       inputs[k] = v.cuda()
+  outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.6)
   results = tokenizer.batch_decode(outputs)[0]
   print(results)
   ```