File size: 4,429 Bytes
e3888e2 6e82245 e93a8ff fa71dca e93a8ff 9c27a4b e93a8ff 287386b 49babce ffc0e1d 9c838bd d2d5712 9c838bd c8a70ff d2d5712 ffc0e1d e93a8ff 860f5cd e93a8ff a524115 5e0c2b6 e93a8ff a72efd2 4ab4475 a72efd2 e93a8ff c72a93e e93a8ff c72a93e 3bebb30 e93a8ff 3bebb30 e93a8ff ecf5055 e93a8ff a8034da 994ff20 c1115ff b6cdc0d 353e28a c1115ff 884f45b a8034da 856675e 2e8ea74 ce02dcd 2e8ea74 856675e 2e8ea74 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
---
{}
---
# Zenos GPT-J 6B Instruct 4-bit
## Model Overview
- **Name:** zenos-gpt-j-6B-instruct-4bit
- **Datasets Used:** [Alpaca Spanish](https://huggingface.co/datasets/bertin-project/alpaca-spanish), [Evol Instruct](https://huggingface.co/datasets/FreedomIntelligence/evol-instruct-spanish)
- **Architecture:** GPT-J
- **Model Size:** 6 Billion parameters
- **Precision:** 4 bits
- **Fine-tuning:** This model was fine-tuned using Low-Rank Adaptation (LoRa).
- **Content Moderation:** This model is not moderated.
## Description
Zenos GPT-J 6B Instruct 4-bit is a Spanish Instruction capable model based on the GPT-J architecture with 6 billion parameters. It has been fine-tuned on the Alpaca Spanish and Evol Instruct datasets, making it particularly suitable for natural language understanding and generation tasks in Spanish.
An experimental Twitter (**X**) bot is available at [https://twitter.com/ZenosBot](https://twitter.com/ZenosBot) which makes comments on news published in media outlets from Argentina.
### Requirements
The latest development version of Transformers, which includes serialization of 4 bits models.
- [Transformers](https://huggingface.co/docs/transformers/installation#install-from-source)
- Bitsandbytes >= 0.41.3
Since this is a compressed version (4 bits), it can fit into ~7GB of VRAM.
## Usage
You can use this model for various natural language processing tasks such as text generation, summarization, and more. Below is an example of how to use it in Python with the Transformers library:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("webpolis/zenos-gpt-j-6B-instruct-4bit")
model = AutoModelForCausalLM.from_pretrained(
"webpolis/zenos-gpt-j-6B-instruct-4bit",
use_safetensors=True
)
user_msg = '''Escribe un poema breve utilizando los siguientes conceptos:
Bienestar, Corriente, Iluminación, Sed'''
# Generate text; watch out the padding between [INST] ... [/INST]
prompt = f'[INST] {user_msg} [/INST]'
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].to(model.device)
attention_mask = inputs["attention_mask"].to(model.device)
generation_config = GenerationConfig(
temperature=0.2,
top_p=0.8,
top_k=40,
num_beams=1,
repetition_penalty=1.3,
do_sample=True
)
with torch.no_grad():
generation_output = model.generate(
input_ids=input_ids,
pad_token_id=tokenizer.eos_token_id,
attention_mask=attention_mask,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=False,
max_new_tokens=512,
early_stopping=True
)
s = generation_output.sequences[0]
output = tokenizer.decode(s)
start_txt = output.find('[/INST]') + len('[/INST]')
end_txt = output.find("<|endoftext|>", start_txt)
answer = output[start_txt:end_txt]
print(answer)
```
# Inference
## Online
Currently, the HuggingFace's Inference Tool UI doesn't properly load the model. However, you can use it with regular Python code as shown above once you meet the [requirements](#requirements).
## CPU
Best performance can be achieved downloading the [GGML 4 bits](https://huggingface.co/webpolis/zenos-gpt-j-6B-instruct-4bit/resolve/main/ggml-f16-q4_0.bin) model and doing inference using the [rustformers' llm](https://github.com/rustformers/llm) tool.
### Requirements
For optimal performance:
- 4 CPU cores
- 8GB RAM
In my Core i7 laptop it goes around 250ms per token:
![](https://huggingface.co/webpolis/zenos-gpt-j-6B-instruct-4bit/resolve/main/poema1.gif)
# Acknowledgments
This model was developed by [Nicolás Iglesias](mailto:[email protected]) using the Hugging Face Transformers library.
# LICENSE
Copyright 2023 [Nicolás Iglesias](mailto:[email protected])
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this software except in compliance with the License.
You may obtain a copy of the License at
[Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0)
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
|