m-polignano-uniba commited on
Commit
2d4b3bf
1 Parent(s): 7c2db1a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +121 -0
README.md CHANGED
@@ -253,6 +253,127 @@ The 🌟**ANITA project**🌟 *(**A**dvanced **N**atural-based interaction for t
253
  - **Context length**: 8K, 8192.
254
  <hr>
255
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
256
  #### Unsloth
257
 
258
  <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/made with unsloth.png" width="200px" align="center" />
 
253
  - **Context length**: 8K, 8192.
254
  <hr>
255
 
256
+ ### Transformers
257
+
258
+ For direct use with `transformers`, you can easily get started with the following steps.
259
+
260
+ - Firstly, you need to install transformers via the command below with `pip`.
261
+
262
+ ```bash
263
+ pip install -U transformers
264
+ ```
265
+
266
+ - Right now, you can start using the model directly.
267
+
268
+ ```python
269
+ import torch
270
+ from transformers import (
271
+ AutoModelForCausalLM,
272
+ AutoTokenizer,
273
+ )
274
+
275
+ base_model = "m-polignano-uniba/LLaMAntino-3-ANITA-8B-sft-ORPO"
276
+ model = AutoModelForCausalLM.from_pretrained(
277
+ base_model,
278
+ torch_dtype=torch.bfloat16,
279
+ device_map="auto",
280
+ )
281
+ tokenizer = AutoTokenizer.from_pretrained(base_model)
282
+
283
+ messages = [
284
+ {"role": "system", "content": "Answer clearly and detailed."},
285
+ {"role": "user", "content": "Why is the sky blue ?"}
286
+ ]
287
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
288
+ inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
289
+ for k,v in inputs.items():
290
+ inputs[k] = v.cuda()
291
+ outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.85, temperature=0.7)
292
+ results = tokenizer.batch_decode(outputs)[0]
293
+ print(results)
294
+ ```
295
+
296
+ - Additionally, you can also use a model with **4bit quantization** to reduce the required resources at least. You can start with the code below.
297
+
298
+ ```python
299
+ import torch
300
+ from transformers import (
301
+ AutoModelForCausalLM,
302
+ AutoTokenizer,
303
+ BitsAndBytesConfig,
304
+ )
305
+
306
+ base_model = "m-polignano-uniba/LLaMAntino-3-ANITA-8B-sft-ORPO"
307
+ bnb_config = BitsAndBytesConfig(
308
+ load_in_4bit=True,
309
+ bnb_4bit_quant_type="nf4",
310
+ bnb_4bit_compute_dtype=torch.bfloat16,
311
+ bnb_4bit_use_double_quant=False,
312
+ )
313
+ model = AutoModelForCausalLM.from_pretrained(
314
+ base_model,
315
+ quantization_config=bnb_config,
316
+ device_map="auto",
317
+ )
318
+ tokenizer = AutoTokenizer.from_pretrained(base_model)
319
+
320
+ messages = [
321
+ {"role": "system", "content": "Answer clearly and detailed."},
322
+ {"role": "user", "content": "Why is the sky blue ?"}
323
+ ]
324
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
325
+ inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
326
+ for k,v in inputs.items():
327
+ inputs[k] = v.cuda()
328
+ outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.85, temperature=0.7)
329
+ results = tokenizer.batch_decode(outputs)[0]
330
+ print(results)
331
+
332
+ ```
333
+
334
+ ### Unsloth
335
+
336
+ For direct use with `unsloth`, you can easily get started with the following steps.
337
+
338
+ - Firstly, you need to install unsloth via the command below with `pip`.
339
+ ```bash
340
+ pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
341
+ pip install --no-deps xformers trl peft accelerate bitsandbytes
342
+ ```
343
+
344
+ - Initialize and optimize the model before use.
345
+ ```python
346
+ from unsloth import FastLanguageModel
347
+ import torch
348
+
349
+ base_model = "m-polignano-uniba/LLaMAntino-3-ANITA-8B-sft-ORPO"
350
+ model, tokenizer = FastLanguageModel.from_pretrained(
351
+ model_name = base_model,
352
+ max_seq_length = 8192,
353
+ dtype = None,
354
+ load_in_4bit = True, # Change to `False` if you don't want to use 4bit quantization.
355
+ )
356
+ FastLanguageModel.for_inference(model)
357
+ ```
358
+
359
+ - Right now, you can start using the model directly.
360
+ ```python
361
+ messages = [
362
+ {"role": "system", "content": "Answer clearly and detailed."},
363
+ {"role": "user", "content": "Why is the sky blue ?"}
364
+ ]
365
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
366
+ inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
367
+ for k,v in inputs.items():
368
+ inputs[k] = v.cuda()
369
+ outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, top_p=0.85, temperature=0.7)
370
+ results = tokenizer.batch_decode(outputs)[0]
371
+ print(results)
372
+ ```
373
+
374
+
375
+
376
+ <hr>
377
  #### Unsloth
378
 
379
  <img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/made with unsloth.png" width="200px" align="center" />