shimmyshimmer
commited on
Commit
•
b549043
1
Parent(s):
14a73f9
Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,7 @@ tags:
|
|
10 |
|
11 |
---
|
12 |
|
|
|
13 |
# Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory via Unsloth!
|
14 |
|
15 |
A reupload from https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
|
@@ -18,76 +19,22 @@ We have a Google Colab Tesla T4 notebook for TinyLlama with 4096 max sequence le
|
|
18 |
|
19 |
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord%20button.png" width="200"/>](https://discord.gg/u54VK8m8tk)
|
20 |
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/buy%20me%20a%20coffee%20button.png" width="200"/>](https://ko-fi.com/unsloth)
|
21 |
-
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="
|
22 |
-
|
23 |
-
```python
|
24 |
-
from unsloth import FastLanguageModel
|
25 |
-
import torch
|
26 |
-
from trl import SFTTrainer
|
27 |
-
from transformers import TrainingArguments
|
28 |
-
from datasets import load_dataset
|
29 |
-
max_seq_length = 2048 # Supports RoPE Scaling interally, so choose any!
|
30 |
-
# Get LAION dataset
|
31 |
-
url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"
|
32 |
-
dataset = load_dataset("json", data_files = {"train" : url}, split = "train")
|
33 |
-
|
34 |
-
# 4bit pre quantized models we support - 4x faster downloading!
|
35 |
-
fourbit_models = [
|
36 |
-
"unsloth/mistral-7b-bnb-4bit",
|
37 |
-
"unsloth/llama-2-7b-bnb-4bit",
|
38 |
-
"unsloth/llama-2-13b-bnb-4bit",
|
39 |
-
"unsloth/codellama-34b-bnb-4bit",
|
40 |
-
"unsloth/tinyllama-bnb-4bit",
|
41 |
-
] # Go to https://huggingface.co/unsloth for more 4-bit models!
|
42 |
|
43 |
-
|
44 |
-
model, tokenizer = FastLanguageModel.from_pretrained(
|
45 |
-
model_name = "unsloth/mistral-7b-bnb-4bit", # Supports Llama, Mistral - replace this!
|
46 |
-
max_seq_length = max_seq_length,
|
47 |
-
dtype = None,
|
48 |
-
load_in_4bit = True,
|
49 |
-
)
|
50 |
|
51 |
-
|
52 |
-
model = FastLanguageModel.get_peft_model(
|
53 |
-
model,
|
54 |
-
r = 16,
|
55 |
-
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
|
56 |
-
"gate_proj", "up_proj", "down_proj",],
|
57 |
-
lora_alpha = 16,
|
58 |
-
lora_dropout = 0, # Supports any, but = 0 is optimized
|
59 |
-
bias = "none", # Supports any, but = "none" is optimized
|
60 |
-
use_gradient_checkpointing = True,
|
61 |
-
random_state = 3407,
|
62 |
-
max_seq_length = max_seq_length,
|
63 |
-
use_rslora = False, # We support rank stabilized LoRA
|
64 |
-
loftq_config = None, # And LoftQ
|
65 |
-
)
|
66 |
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
warmup_steps = 10,
|
77 |
-
max_steps = 60,
|
78 |
-
fp16 = not torch.cuda.is_bf16_supported(),
|
79 |
-
bf16 = torch.cuda.is_bf16_supported(),
|
80 |
-
logging_steps = 1,
|
81 |
-
output_dir = "outputs",
|
82 |
-
optim = "adamw_8bit",
|
83 |
-
seed = 3407,
|
84 |
-
),
|
85 |
-
)
|
86 |
-
trainer.train()
|
87 |
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
# (3) Adding an evaluation loop / OOMs
|
92 |
-
# (4) Cutomized chat templates
|
93 |
-
```
|
|
|
10 |
|
11 |
---
|
12 |
|
13 |
+
|
14 |
# Finetune Mistral, Gemma, Llama 2-5x faster with 70% less memory via Unsloth!
|
15 |
|
16 |
A reupload from https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
|
|
|
19 |
|
20 |
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord%20button.png" width="200"/>](https://discord.gg/u54VK8m8tk)
|
21 |
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/buy%20me%20a%20coffee%20button.png" width="200"/>](https://ko-fi.com/unsloth)
|
22 |
+
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
+
## ✨ Finetune for Free
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
+
All notebooks are **beginner friendly**! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
+
| Unsloth supports | Free Notebooks | Performance | Memory use |
|
29 |
+
|-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
|
30 |
+
| **Gemma 7b** | [▶️ Start on Colab](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing) | 2.4x faster | 58% less |
|
31 |
+
| **Mistral 7b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing) | 2.2x faster | 62% less |
|
32 |
+
| **Llama-2 7b** | [▶️ Start on Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing) | 2.2x faster | 43% less |
|
33 |
+
| **TinyLlama** | [▶️ Start on Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing) | 3.9x faster | 74% less |
|
34 |
+
| **CodeLlama 34b** A100 | [▶️ Start on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing) | 1.9x faster | 27% less |
|
35 |
+
| **Mistral 7b** 1xT4 | [▶️ Start on Kaggle](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook) | 5x faster\* | 62% less |
|
36 |
+
| **DPO - Zephyr** | [▶️ Start on Colab](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) | 1.9x faster | 19% less |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
+
- This [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing) is useful for ShareGPT ChatML / Vicuna templates.
|
39 |
+
- This [text completion notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
|
40 |
+
- \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.
|
|
|
|
|
|