aao331
/

Carpincho-30b-qlora

Model card Files Files and versions Community

Carpincho-30b-qlora / README.md

aao331's picture

Create README.md

b6d2ca2 over 1 year ago

|

3.76 kB

	---
	language:
	- en
	- es
	---

	# Model Card for Carpincho-30b

	<!-- Provide a quick summary of what the model is/does. -->

	This is Carpincho-30B qlora 4-bit checkpoint, an Instruction-tuned LLM based on LLama-30B. It is trained to answer in colloquial spanish Argentine language.

	It was trained on 2x3090 (48G) for 120 hs using huggingface QLoRA code (4-bit quantization)

	## Model Details

	The model is provided in LoRA format.

	## Usage

	Here is example inference code, you will need to install requirements for https://github.com/johnsmith0031/alpaca_lora_4bit

	```
	import time
	import torch
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer

	model_name = "models/huggyllama_llama-30b/"
	adapters_name = 'carpincho-30b-qlora'

	print(f"Starting to load the model {model_name} into memory")

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	load_in_4bit=True,
	torch_dtype=torch.bfloat16,
	device_map="sequential"
	)

	print(f"Loading {adapters_name} into memory")
	model = PeftModel.from_pretrained(model, adapters_name)
	tokenizer = LlamaTokenizer.from_pretrained(model_name)
	tokenizer.bos_token_id = 1

	stop_token_ids = [0]

	print(f"Successfully loaded the model {model_name} into memory")

	def main(tokenizer):
	prompt = '''Below is an instruction that describes a task. Write a response that appropriately completes the request.
	### Instruction:
	%s
	### Response:
	''' % "Hola, como estas?"

	batch = tokenizer(prompt, return_tensors="pt")
	batch = {k: v.cuda() for k, v in batch.items()}

	with torch.no_grad():
	generated = model.generate(inputs=batch["input_ids"],
	do_sample=True, use_cache=True,
	repetition_penalty=1.1,
	max_new_tokens=100,
	temperature=0.9,
	top_p=0.95,
	top_k=40,
	return_dict_in_generate=True,
	output_attentions=False,
	output_hidden_states=False,
	output_scores=False)
	result_text = tokenizer.decode(generated['sequences'].cpu().tolist()[0])
	print(result_text)

	main(tokenizer)
	```

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Developed by: Alfredo Ortega (@ortegaalfredo)
	- Model type: 30B LLM QLoRA
	- Language(s): (NLP): English and colloquial Argentine Spanish
	- License: Free for non-commercial use, but I'm not the police.
	- Finetuned from model: https://huggingface.co/huggyllama/llama-30b

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: https://huggingface.co/huggyllama/llama-30b
	- Paper [optional]: https://arxiv.org/abs/2302.13971

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
	This is a generic LLM chatbot that can be used to interact directly with humans.

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->
	This bot is uncensored and may provide shocking answers. Also it contains bias present in the training material.


	### Recommendations

	<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

	Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

	## Model Card Contact

	Contact the creator at @ortegaalfredo on twitter/github