aao331 commited on
Commit
b6d2ca2
1 Parent(s): d4b3523

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - es
5
+ ---
6
+
7
+ # Model Card for Carpincho-30b
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+
11
+ This is Carpincho-30B qlora 4-bit checkpoint, an Instruction-tuned LLM based on LLama-30B. It is trained to answer in colloquial spanish Argentine language.
12
+
13
+ It was trained on 2x3090 (48G) for 120 hs using huggingface QLoRA code (4-bit quantization)
14
+
15
+ ## Model Details
16
+
17
+ The model is provided in LoRA format.
18
+
19
+ ## Usage
20
+
21
+ Here is example inference code, you will need to install requirements for https://github.com/johnsmith0031/alpaca_lora_4bit
22
+
23
+ ```
24
+ import time
25
+ import torch
26
+ from peft import PeftModel
27
+ from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer
28
+
29
+ model_name = "models/huggyllama_llama-30b/"
30
+ adapters_name = 'carpincho-30b-qlora'
31
+
32
+ print(f"Starting to load the model {model_name} into memory")
33
+
34
+ model = AutoModelForCausalLM.from_pretrained(
35
+ model_name,
36
+ load_in_4bit=True,
37
+ torch_dtype=torch.bfloat16,
38
+ device_map="sequential"
39
+ )
40
+
41
+ print(f"Loading {adapters_name} into memory")
42
+ model = PeftModel.from_pretrained(model, adapters_name)
43
+ tokenizer = LlamaTokenizer.from_pretrained(model_name)
44
+ tokenizer.bos_token_id = 1
45
+
46
+ stop_token_ids = [0]
47
+
48
+ print(f"Successfully loaded the model {model_name} into memory")
49
+
50
+ def main(tokenizer):
51
+ prompt = '''Below is an instruction that describes a task. Write a response that appropriately completes the request.
52
+ ### Instruction:
53
+ %s
54
+ ### Response:
55
+ ''' % "Hola, como estas?"
56
+
57
+ batch = tokenizer(prompt, return_tensors="pt")
58
+ batch = {k: v.cuda() for k, v in batch.items()}
59
+
60
+ with torch.no_grad():
61
+ generated = model.generate(inputs=batch["input_ids"],
62
+ do_sample=True, use_cache=True,
63
+ repetition_penalty=1.1,
64
+ max_new_tokens=100,
65
+ temperature=0.9,
66
+ top_p=0.95,
67
+ top_k=40,
68
+ return_dict_in_generate=True,
69
+ output_attentions=False,
70
+ output_hidden_states=False,
71
+ output_scores=False)
72
+ result_text = tokenizer.decode(generated['sequences'].cpu().tolist()[0])
73
+ print(result_text)
74
+
75
+ main(tokenizer)
76
+ ```
77
+
78
+ ### Model Description
79
+
80
+ <!-- Provide a longer summary of what this model is. -->
81
+
82
+ - **Developed by:** Alfredo Ortega (@ortegaalfredo)
83
+ - **Model type:** 30B LLM QLoRA
84
+ - **Language(s):** (NLP): English and colloquial Argentine Spanish
85
+ - **License:** Free for non-commercial use, but I'm not the police.
86
+ - **Finetuned from model:** https://huggingface.co/huggyllama/llama-30b
87
+
88
+ ### Model Sources [optional]
89
+
90
+ <!-- Provide the basic links for the model. -->
91
+
92
+ - **Repository:** https://huggingface.co/huggyllama/llama-30b
93
+ - **Paper [optional]:** https://arxiv.org/abs/2302.13971
94
+
95
+ ## Uses
96
+
97
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
98
+ This is a generic LLM chatbot that can be used to interact directly with humans.
99
+
100
+ ## Bias, Risks, and Limitations
101
+
102
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
103
+ This bot is uncensored and may provide shocking answers. Also it contains bias present in the training material.
104
+
105
+
106
+ ### Recommendations
107
+
108
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
109
+
110
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
111
+
112
+ ## Model Card Contact
113
+
114
+ Contact the creator at @ortegaalfredo on twitter/github