|
--- |
|
license: mit |
|
datasets: |
|
- sinarashidi/alpaca-persian |
|
language: |
|
- en |
|
- fa |
|
library_name: transformers |
|
--- |
|
|
|
# Maral 7B Alpha 1 |
|
|
|
<p align="center"> |
|
<img src="maral-7b-announce.png" width=256 height=256 /> |
|
</p> |
|
|
|
## What is Maral? |
|
|
|
_Maral_ is just a new large lanugage model, specializing on the Persian language. This model is based on [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) and trained an _Alpaca Persian_ dataset. This model is one of the few efforts in Persian speaking scene in order to bring our language to a new life in the era of AI. |
|
|
|
Also, since Maral is based on Mistral, it's capable of producing English answers as well. |
|
|
|
### What does "Maral" mean? |
|
|
|
Maral is the Persian name of [Red Deer](https://en.wikipedia.org/wiki/Red_deer), which is a native species of deers in Iran. The name has chosen for quite a few reasons, one of them is that the environmental concerns we have and second, since it's a Persian LLM, made by Iranian people, it deserves an Iranian name. |
|
|
|
## Inference |
|
|
|
### Prompt Format |
|
|
|
This model requires _Guanaco_ format, which is like this: |
|
|
|
``` |
|
### Human: <prompt> |
|
### Assistant: <answer> |
|
``` |
|
|
|
So in your code, you may write prompts like this: |
|
|
|
```python |
|
prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟" |
|
prompt = f"### Human:{prompt}\n### Assistant:" |
|
``` |
|
|
|
More information about this on the inference sections. |
|
|
|
### 4 bit Quantization |
|
|
|
If you want to use 4 bit quantization, we have a PEFT for you [here](https://huggingface.co/MaralGPT/MaralGPT-Mistral-7B-v-0-1). Also, you can find _Google Colab_ notebooks [here](https://github.com/prp-e/maralgpt). |
|
|
|
### Installing Libraries |
|
|
|
```pip install transformers accelerate bitsandbytes``` |
|
|
|
_NOTE_: `bitsandbytes` library is only needed for 8 bit version. Otherwise, it's not necessary. |
|
|
|
### Inference on a big GPU |
|
|
|
If you have a big enough GPU like an A100 in your posession, this code is for you. |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig |
|
import torch |
|
|
|
model_name_or_id = "MaralGPT/Maral-7B-alpha-1" |
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.bfloat16, device_map="auto") |
|
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id) |
|
|
|
prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟" |
|
prompt = f"### Human:{prompt}\n### Assistant:" |
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
|
|
|
generation_config = GenerationConfig( |
|
do_sample=True, |
|
top_k=1, |
|
temperature=0.5, |
|
max_new_tokens=300, |
|
pad_token_id=tokenizer.eos_token_id |
|
) |
|
|
|
outputs = model.generate(**inputs, generation_config=generation_config) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
### Inference on a small GPU (Consumer Hardware/Free Colab) |
|
|
|
The code is pretty much the same as above, but with a slight diferrence. |
|
|
|
* Make sure `bitsandbytes` is installed correctly. |
|
* Your model loading must be `model = AutoModelForCausalLM.from_pretrained(model_name_or_id, load_in_8bit=True, torch_dtype=torch.bfloat16, device_map="auto")` |
|
|
|
On _free version_ of Google Colab, you may face RAM problems. I guess using `low_cpu_mem_usage=True` in model loading would help. |
|
|
|
## Known Issues |
|
|
|
* The model produces GPT-3.5 level answers in terms of grammar (specially Persian) but is capable of extremely insane hallucinations. This problem can be solved by a better dataset and better training procedures (such as DPO). |
|
* According to the previous issue, the model can also generate misinforming answers specially when dealing with _reasoning_ problems in Persian. |
|
* The model is huge, so it requires a lot of resources in order to work correctly. However, we may provide _GPTQ_ or _GGUF_ versions as well. |
|
* The prompt format works and it proves our concept of a _instruct following_ LLM, but since we haven't changed `eos_token` and `bos_token` to our own, you may see unncessary information being generated by the model. |
|
* According to the previous issue, the model is capable of repeating itself. To solve this problem _temporarily_ you have to keep temperature below 1. According to our tests somewhere between 0.5 to 0.7 is a sweet spot. |
|
|
|
## Our Team |
|
|
|
* Muhammadreza Haghiri ([Website](https://haghiri75.com/en) - [Github](https://github.com/prp-e) - [LinkedIn](https://www.linkedin.com/in/muhammadreza-haghiri-1761325b)) |
|
* Mahi Mohrechi ([Website](https://mohrechi-portfolio.vercel.app/) - [Github](https://github.com/f-mohrechi) - [LinkedIn](https://www.linkedin.com/in/faeze-mohrechi/)) |
|
|
|
## Special Thanks |
|
|
|
* Mistral Team for providing the best open source base model ever. |
|
* _Sina Rashidi_, who translated Alpaca dataset to Persian. |
|
* [Jupyto](https://jupyto.com) team for providing our infrastructure. |
|
|