File size: 2,710 Bytes
0dcfea2
 
 
 
 
87d0f1f
0dcfea2
 
 
87d0f1f
0dcfea2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
license: cc
inference: false
language:
- fa
- en
tags:
- llama
- text-generation-inference
pipeline_tag: text-generation
---

# UniversityOfTehran's PersianMind-v1.0 GGUF

These files are GGUF format model files for [UniversityOfTehran's PersianMind-v1.0](https://huggingface.co/universitytehran/PersianMind-v1.0).

GGUF files are for CPU + GPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp) and libraries and UIs which support this format, such as:
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
* [KoboldCpp](https://github.com/LostRuins/koboldcpp)
* [ParisNeo/GPT4All-UI](https://github.com/ParisNeo/gpt4all-ui)
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)
* [ctransformers](https://github.com/marella/ctransformers)

## How to run in `llama.cpp`

I use the following command line, adjust for your tastes and needs:

```
./main -t 2 -ngl 32 -m PersianMind-v1.0.q4_K_M.gguf --color -c 2048 --temp 0.7 --repeat_penalty 1.2 -n -1 -e -p "This is a conversation with PersianMind. It is an artificial intelligence model designed by a team of NLP experts at the University of Tehran to help you with various tasks such as answering questions, providing recommendations, and helping with decision making. You can ask it anything you want and it will do its best to give you accurate and relevant information.\nYou: در مورد هوش مصنوعی توضیح بده.\nPersianMind: "
```
Change `-t 2` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.

Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.

If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`, you can use `--interactive-first` to start in interactive mode.

## Compatibility

I have uploded both the original llama.cpp quant methods (`q4_0, q4_1, q5_0, q5_1, q8_0`) as well as the k-quant methods (`q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K`).

Please refer to [llama.cpp](https://github.com/ggerganov/llama.cpp) and [TheBloke](https://huggingface.co/TheBloke)'s GGUF models for further explanation.

## How to run in `text-generation-webui`

Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).

<!-- footer start -->
## Thanks

Thanks to [Pedram Rostami, Ali Salemi, and Mohammad Javad Dousti](https://huggingface.co/universitytehran) for providing checkpoints of the model.

Thanks to [Georgi Gerganov](https://github.com/ggerganov) and all of the awesome people in the AI community.