|
--- |
|
language: |
|
- en |
|
- es |
|
--- |
|
|
|
# Model Card for Carpincho-30b |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This is Carpincho-30B qlora 4-bit checkpoint, an Instruction-tuned LLM based on LLama-30B. It is trained to answer in colloquial spanish Argentine language. |
|
|
|
It was trained on 2x3090 (48G) for 120 hs using huggingface QLoRA code (4-bit quantization) |
|
|
|
## Model Details |
|
|
|
The model is provided in LoRA format. |
|
|
|
## Usage |
|
|
|
Here is example inference code, you will need to install requirements for https://github.com/johnsmith0031/alpaca_lora_4bit |
|
|
|
``` |
|
import time |
|
import torch |
|
from peft import PeftModel |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer |
|
|
|
model_name = "models/huggyllama_llama-30b/" |
|
adapters_name = 'carpincho-30b-qlora' |
|
|
|
print(f"Starting to load the model {model_name} into memory") |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
load_in_4bit=True, |
|
torch_dtype=torch.bfloat16, |
|
device_map="sequential" |
|
) |
|
|
|
print(f"Loading {adapters_name} into memory") |
|
model = PeftModel.from_pretrained(model, adapters_name) |
|
tokenizer = LlamaTokenizer.from_pretrained(model_name) |
|
tokenizer.bos_token_id = 1 |
|
|
|
stop_token_ids = [0] |
|
|
|
print(f"Successfully loaded the model {model_name} into memory") |
|
|
|
def main(tokenizer): |
|
prompt = '''Below is an instruction that describes a task. Write a response that appropriately completes the request. |
|
### Instruction: |
|
%s |
|
### Response: |
|
''' % "Hola, como estas?" |
|
|
|
batch = tokenizer(prompt, return_tensors="pt") |
|
batch = {k: v.cuda() for k, v in batch.items()} |
|
|
|
with torch.no_grad(): |
|
generated = model.generate(inputs=batch["input_ids"], |
|
do_sample=True, use_cache=True, |
|
repetition_penalty=1.1, |
|
max_new_tokens=100, |
|
temperature=0.9, |
|
top_p=0.95, |
|
top_k=40, |
|
return_dict_in_generate=True, |
|
output_attentions=False, |
|
output_hidden_states=False, |
|
output_scores=False) |
|
result_text = tokenizer.decode(generated['sequences'].cpu().tolist()[0]) |
|
print(result_text) |
|
|
|
main(tokenizer) |
|
``` |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Developed by:** Alfredo Ortega (@ortegaalfredo) |
|
- **Model type:** 30B LLM QLoRA |
|
- **Language(s):** (NLP): English and colloquial Argentine Spanish |
|
- **License:** Free for non-commercial use, but I'm not the police. |
|
- **Finetuned from model:** https://huggingface.co/huggyllama/llama-30b |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://huggingface.co/huggyllama/llama-30b |
|
- **Paper [optional]:** https://arxiv.org/abs/2302.13971 |
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
This is a generic LLM chatbot that can be used to interact directly with humans. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
This bot is uncensored and may provide shocking answers. Also it contains bias present in the training material. |
|
|
|
|
|
### Recommendations |
|
|
|
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. |
|
|
|
## Model Card Contact |
|
|
|
Contact the creator at @ortegaalfredo on twitter/github |