swap-uniba
/

LLaMAntino-3-ANITA-8B-Inst-DPO-ITA

@@ -253,6 +253,127 @@ The 🌟**ANITA project**🌟 *(**A**dvanced **N**atural-based interaction for t
 - **Context length**: 8K, 8192.
 <hr>
 #### Unsloth
 <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/made with unsloth.png" width="200px" align="center" />

 - **Context length**: 8K, 8192.
 <hr>
+### Transformers
+For direct use with `transformers`, you can easily get started with the following steps.
+- Firstly, you need to install transformers via the command below with `pip`.
+  ```bash
+  pip install -U transformers
+  ```
+- Right now, you can start using the model directly.
+  ```python
+  import torch
+  from transformers import (
+      AutoModelForCausalLM,
+      AutoTokenizer,
+  )
+  base_model = "m-polignano-uniba/LLaMAntino-3-ANITA-8B-sft-ORPO"
+  model = AutoModelForCausalLM.from_pretrained(
+      base_model,
+      torch_dtype=torch.bfloat16,
+      device_map="auto",
+  )
+  tokenizer = AutoTokenizer.from_pretrained(base_model)
+  messages = [
+      {"role": "system", "content": "Answer clearly and detailed."},
+      {"role": "user", "content": "Why is the sky blue ?"}
+  ]
+  prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+  inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
+  for k,v in inputs.items():
+      inputs[k] = v.cuda()
+  outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.85, temperature=0.7)
+  results = tokenizer.batch_decode(outputs)[0]
+  print(results)
+  ```
+- Additionally, you can also use a model with **4bit quantization** to reduce the required resources at least. You can start with the code below.
+  ```python
+  import torch
+  from transformers import (
+      AutoModelForCausalLM,
+      AutoTokenizer,
+      BitsAndBytesConfig,
+  )
+  base_model = "m-polignano-uniba/LLaMAntino-3-ANITA-8B-sft-ORPO"
+  bnb_config = BitsAndBytesConfig(
+      load_in_4bit=True,
+      bnb_4bit_quant_type="nf4",
+      bnb_4bit_compute_dtype=torch.bfloat16,
+      bnb_4bit_use_double_quant=False,
+  )
+  model = AutoModelForCausalLM.from_pretrained(
+      base_model,
+      quantization_config=bnb_config,
+      device_map="auto",
+  )
+  tokenizer = AutoTokenizer.from_pretrained(base_model)
+  messages = [
+      {"role": "system", "content": "Answer clearly and detailed."},
+      {"role": "user", "content": "Why is the sky blue ?"}
+  ]
+  prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+  inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
+  for k,v in inputs.items():
+      inputs[k] = v.cuda()
+  outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.85, temperature=0.7)
+  results = tokenizer.batch_decode(outputs)[0]
+  print(results)
+  ```
+### Unsloth
+For direct use with `unsloth`, you can easily get started with the following steps.
+- Firstly, you need to install unsloth via the command below with `pip`.
+  ```bash
+  pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
+  pip install --no-deps xformers trl peft accelerate bitsandbytes
+  ```
+- Initialize and optimize the model before use.
+  ```python
+  from unsloth import FastLanguageModel
+  import torch
+  base_model = "m-polignano-uniba/LLaMAntino-3-ANITA-8B-sft-ORPO"
+  model, tokenizer = FastLanguageModel.from_pretrained(
+      model_name = base_model,
+      max_seq_length = 8192,
+      dtype = None,
+      load_in_4bit = True, # Change to `False` if you don't want to use 4bit quantization.
+  )
+  FastLanguageModel.for_inference(model)
+  ```
+- Right now, you can start using the model directly.
+  ```python
+  messages = [
+      {"role": "system", "content": "Answer clearly and detailed."},
+      {"role": "user", "content": "Why is the sky blue ?"}
+  ]
+  prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+  inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
+  for k,v in inputs.items():
+      inputs[k] = v.cuda()
+  outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.85, temperature=0.7)
+  results = tokenizer.batch_decode(outputs)[0]
+  print(results)
+  ```
+<hr>
 #### Unsloth
 <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/made with unsloth.png" width="200px" align="center" />