File size: 3,645 Bytes
897119f
54b2905
897119f
 
b745462
 
1ecbfdf
 
897119f
 
 
 
54b2905
 
b745462
1ecbfdf
 
897119f
 
 
 
 
3cb3051
897119f
8129faf
 
3cb3051
8129faf
54b2905
897119f
d0e7b95
897119f
d4d2996
897119f
13fb0e3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
897119f
 
13fb0e3
897119f
 
 
 
 
 
 
 
 
 
 
13fb0e3
897119f
 
13fb0e3
897119f
 
 
 
13fb0e3
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
license: mit
library_name: peft
tags:
- PEFT
- Qlora
- mistral-7b
- fine-tuning
base_model: mistralai/Mistral-7B-v0.1
model-index:
- name: mistral7b-fine-tuned-qlora
  results: []
datasets:
- timdettmers/openassistant-guanaco
pipeline_tag: text-generation
language:
- en
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Mistral7b-fine-tuned-qlora

<img src="https://www.kdnuggets.com/wp-content/uploads/selvaraj_mistral_7bv02_finetuning_mistral_new_opensource_llm_hugging_face_3.png" alt="im" width="700" />

# Model version and Dataset

This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on  [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco) dataset.

## Usage guidance

Please refer to [this notebook](https://github.com/shirinyamani/mistral7b-lora-finetuning/blob/main/misral_7B_updated.ipynb) for a complete demo including notes regarding cloud deployment

## Inference

```python
import os
from os.path import exists, join, isdir
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, GenerationConfig
from peft import PeftModel
from peft.tuners.lora import LoraLayer

# Update variables!
max_new_tokens = 100
top_p = 0.9
temperature=0.7
user_question = "What is  central limit theorem?"

# Base model
model_name_or_path = 'mistralai/Mistral-7B-v0.1' # Change it to 'YOUR_BASE_MODEL'
adapter_path = 'ShirinYamani/mistral7b-fine-tuned-qlora' # Change it to 'YOUR_ADAPTER_PATH'

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
# if you wanna use LLaMA HF then fix the early conversion issues.
tokenizer.bos_token_id = 1

# Load the model (use bf16 for faster inference)
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    torch_dtype=torch.bfloat16,
    device_map={"": 0},
    # Qlora -- 4-bit config
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type='nf4',
    )
)

model = PeftModel.from_pretrained(model, adapter_path)
model.eval()

prompt = (
    "A chat between a curious human and an artificial intelligence assistant. "
    "The assistant gives helpful, detailed, and polite answers to the user's questions. "
    "### Human: {user_question}"
    "### Assistant: "
)

def generate(model, user_question, max_new_tokens=max_new_tokens, top_p=top_p, temperature=temperature):
    inputs = tokenizer(prompt.format(user_question=user_question), return_tensors="pt").to('cuda')

    outputs = model.generate(
        **inputs,
        generation_config=GenerationConfig(
            do_sample=True,
            max_new_tokens=max_new_tokens,
            top_p=top_p,
            temperature=temperature,
        )
    )

    text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(text)
    return text

generate(model, user_question)
```

### Training hyperparameters

```python
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 10
- mixed_precision_training: Native AMP
```

### Framework versions
```python
- PEFT 0.11.2.dev0
- Transformers 4.42.0.dev0
- Pytorch 2.3.0+cu121
- Datasets 2.19.2
- Tokenizers 0.19.1
```