File size: 4,326 Bytes
d9791fd 732a0ac 3ae2899 d9791fd 2e769c2 eadbb15 d9791fd eadbb15 3ae2899 4c1cc83 3ae2899 cc34c66 3ae2899 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
license: mit
datasets:
- Nebulous/gpt4all_pruned
- sahil2801/CodeAlpaca-20k
- yahma/alpaca-cleaned
language:
- en
tags:
- sft
pipeline_tag: text-generation
widget:
- text: <|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>
- text: <|prompter|>What's the Earth total population</s><|assistant|>
- text: <|prompter|>Write a story about future of AI development</s><|assistant|>
---
# LoRA Adapter for LLaMA 7B trained on more datasets than tloen/alpaca-lora-7b
This repo contains a low-rank adapter for **LLaMA-7b** fit on
- `Nebulous/gpt4all_pruned`
- `sahil2801/CodeAlpaca-20k`
- `yahma/alpaca-cleaned`
- datasets part of the OpenAssistant project.
This version of the weights was trained with the following hyperparameters:
- Epochs: 2
- Batch size: 128
- Max Length: 2048
- Learning rate: 4e-6
- Lora _r_: 8
- Lora Alpha: 32
- Lora target modules: q_proj, k_proj, v_proj, o_proj
The model was trained with flash attention and gradient checkpointing.
## Model Details
- **Developed** as part of the OpenAssistant Project
- **Model type:** PEFT Adapter for frozen LLaMA
- **Language:** English
## Prompting
Two special tokens are used to mark the beginning of user and assistant turns:
`<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.
Input prompt example:
```
<|prompter|>What is a meme, and what's the history behind this word?</s><|assistant|>
```
The input ends with the `<|assistant|>` token to signal that the model should
start generating the assistant reply.
# Example Inference Code (Note several embeddings need to be loaded along with the LoRA weights), assumes on GPU and torch.float16:
```
from typing import List, NamedTuple
import torch
import transformers
from huggingface_hub import hf_hub_download
from peft import PeftModel
from transformers import GenerationConfig
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = transformers.AutoTokenizer.from_pretrained("jordiclive/gpt4all-alpaca-oa-codealpaca-lora-7b")
model = transformers.AutoModelForCausalLM.from_pretrained(
"decapoda-research/llama-7b-hf", torch_dtype=torch.float16
) # Load Base Model
model.resize_token_embeddings(
32016
) # This model repo also contains several embeddings for special tokens that need to be loaded.
model.config.eos_token_id = tokenizer.eos_token_id
model.config.bos_token_id = tokenizer.bos_token_id
model.config.pad_token_id = tokenizer.pad_token_id
lora_weights = "jordiclive/gpt4all-alpaca-oa-codealpaca-lora-7b"
model = PeftModel.from_pretrained(
model,
lora_weights,
torch_dtype=torch.float16,
) # Load Lora model
model.eos_token_id = tokenizer.eos_token_id
filename = hf_hub_download("jordiclive/gpt4all-alpaca-oa-codealpaca-lora-7b", "extra_embeddings.pt")
embed_weights = torch.load(
filename, map_location=torch.device("cuda" if torch.cuda.is_available() else "cpu")
) # Load embeddings for special tokens
model.base_model.model.model.embed_tokens.weight[32000:, :] = embed_weights.to(
model.base_model.model.model.embed_tokens.weight.dtype
).to(
device
) # Add special token embeddings
model = model.half().to(device)
generation_config = GenerationConfig(
temperature=0.1,
top_p=0.75,
top_k=40,
num_beams=4,
)
def format_system_prompt(prompt, eos_token="</s>"):
return "{}{}{}".format(
"<|prompter|>",
prompt,
eos_token,
)
def generate(prompt, generation_config=generation_config, max_new_tokens=2048, device=device):
prompt = format_system_prompt(prompt) # OpenAssistant Prompt Format expected
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
with torch.no_grad():
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=max_new_tokens,
eos_token_id=2,
)
s = generation_output.sequences[0]
output = tokenizer.decode(s)
print("Text generated:")
print(output)
return output
generate("What is a meme, and what's the history behind this word?")
generate("What's the Earth total population")
generate("Write a story about future of AI development")
```
|