jordiclive commited on
Commit
9cf7214
1 Parent(s): f85aec8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +158 -0
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - OpenAssistant/oasst1
5
+ language:
6
+ - en
7
+ tags:
8
+ - sft
9
+ pipeline_tag: text-generation
10
+ widget:
11
+ - text: >-
12
+ <|prompter|>What is a meme, and what's the history behind this
13
+ word?</s><|assistant|>
14
+ - text: <|prompter|>What's the Earth total population</s><|assistant|>
15
+ - text: <|prompter|>Write a story about future of AI development</s><|assistant|>
16
+ ---
17
+
18
+
19
+
20
+ # LoRA Adapter for Falcon 40B trained on oasst-top1
21
+
22
+ This repo contains a low-rank adapter for **Falcon 40B** fit on datasets part of the OpenAssistant project.
23
+
24
+
25
+ This version of the weights was trained with the following hyperparameters:
26
+
27
+ - Epochs: 8
28
+ - Batch size: 128
29
+ - Max Length: 2048
30
+ - Learning rate: 1e-4
31
+ - Lora _r_: 64
32
+ - Lora Alpha: 16
33
+ - Lora target modules: ["dense_4h_to_h", "dense", "query_key_value", "dense_h_to_4h"]
34
+
35
+ The model was trained with flash attention and gradient checkpointing and deepspeed stage 3 on 8 x A100 80gb
36
+
37
+ ## Dataset Details
38
+ - oasst_export:
39
+ lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk"
40
+ input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz
41
+ val_split: 0.05
42
+
43
+ ## Model Details
44
+
45
+ - **Developed** as part of the OpenAssistant Project
46
+ - **Model type:** PEFT Adapter for frozen Falcon
47
+ - **Language:** English
48
+
49
+ ## Prompting
50
+
51
+ Two special tokens are used to mark the beginning of user and assistant turns:
52
+ `<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.
53
+
54
+ Input prompt example:
55
+ ```
56
+ <|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>
57
+ ```
58
+ The input ends with the `<|assistant|>` token to signal that the model should
59
+ start generating the assistant reply.
60
+
61
+
62
+ # Example Inference Code (Note several embeddings need to be loaded along with the LoRA weights), assumes on GPU and torch.float16:
63
+
64
+ ```
65
+ import torch
66
+ import transformers
67
+ from huggingface_hub import hf_hub_download
68
+ from peft import PeftModel
69
+ from transformers import GenerationConfig
70
+
71
+ device = "cuda" if torch.cuda.is_available() else "cpu"
72
+ dtype = torch.bfloat16
73
+ repo_id = "jordiclive/falcon_lora_40b_ckpt_500_oasst_1"
74
+ base_model = "tiiuae/falcon-40b"
75
+
76
+ # Model Loading
77
+ def transfer_embeddings(model, embed_path, tokenizer):
78
+ old_embeddings = model.get_input_embeddings()
79
+ old_num_tokens, old_embedding_dim = old_embeddings.weight.size()
80
+ new_embeddings = torch.nn.Embedding(old_num_tokens, old_embedding_dim)
81
+ new_embeddings.to(old_embeddings.weight.device, dtype=old_embeddings.weight.dtype)
82
+ model._init_weights(new_embeddings)
83
+ embed_weights = torch.load(embed_path, map_location=old_embeddings.weight.device)
84
+ vocab_size = tokenizer.vocab_size
85
+ new_embeddings.weight.data[:vocab_size, :] = old_embeddings.weight.data[:vocab_size, :]
86
+ new_embeddings.weight.data[vocab_size : vocab_size + embed_weights.shape[0], :] = embed_weights.weight.data.to(
87
+ new_embeddings.weight.dtype
88
+ ).to(new_embeddings.weight.device)
89
+ model.set_input_embeddings(new_embeddings)
90
+ model.tie_weights()
91
+
92
+
93
+ def load_peft_model(model, peft_model_path, tokenizer):
94
+ embed_weights = hf_hub_download(peft_model_path, "extra_embeddings.pt")
95
+ model.resize_token_embeddings(tokenizer.vocab_size + embed_weights.shape[0])
96
+ model.config.eos_token_id = tokenizer.eos_token_id
97
+ model.config.bos_token_id = tokenizer.bos_token_id
98
+ model.config.pad_token_id = tokenizer.pad_token_id
99
+ model = PeftModel.from_pretrained(
100
+ model,
101
+ model_id=peft_model_path,
102
+ torch_dtype=model.dtype,
103
+ )
104
+ model.eos_token_id = tokenizer.eos_token_id
105
+ transfer_embeddings(model, peft_model_path.joinpath("extra_embeddings.pt"), tokenizer)
106
+ return model
107
+
108
+
109
+ tokenizer = transformers.AutoTokenizer.from_pretrained(repo_id)
110
+
111
+ model = transformers.AutoModelForCausalLM.from_pretrained(
112
+ base_model, torch_dtype=dtype, trust_remote_code=True,
113
+ )
114
+ model = load_peft_model(model, repo_id, tokenizer)
115
+
116
+
117
+ # device configuration
118
+ model = model.to(device)
119
+
120
+
121
+ # Choose Generation parameters
122
+
123
+ generation_config = GenerationConfig(
124
+ temperature=0.1,
125
+ top_p=0.75,
126
+ top_k=40,
127
+ num_beams=4,
128
+ )
129
+
130
+
131
+ def format_system_prompt(prompt, eos_token="</s>"):
132
+ return "{}{}{}{}".format("<|prompter|>", prompt, eos_token, "<|assistant|>")
133
+
134
+
135
+ def generate(prompt, generation_config=generation_config, max_new_tokens=2048, device=device):
136
+ prompt = format_system_prompt(prompt) # OpenAssistant Prompt Format expected
137
+ input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
138
+ with torch.no_grad():
139
+ generation_output = model.generate(
140
+ input_ids=input_ids,
141
+ generation_config=generation_config,
142
+ return_dict_in_generate=True,
143
+ output_scores=True,
144
+ max_new_tokens=max_new_tokens,
145
+ eos_token_id=model.eos_token_id,
146
+ )
147
+ s = generation_output.sequences[0]
148
+ output = tokenizer.decode(s)
149
+ print("Text generated:")
150
+ print(output)
151
+ return output
152
+
153
+
154
+ generate("What is a meme, and what's the history behind this word?")
155
+ generate("What's the Earth total population")
156
+ generate("Write a story about future of AI development")
157
+
158
+ ```