Text Generation
GGUF
English
nlp
code
Inference Endpoints
File size: 3,802 Bytes
750e02a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
pipeline_tag: text-generation
inference: true
widget:
- text: 'def print_hello_world():'
  example_title: Hello world
  group: Python
license: bigscience-openrail-m
pretrain-datasets:
- books
- arxiv
- c4
- falcon-refinedweb
- wiki
- github-issues
- stack_markdown
- self-made dataset of permissive github code
datasets:
- bigcode/the-stack-dedup
- rombodawg/2XUNCENSORED_MegaCodeTraining188k
- bigcode/commitpackft
library_name: llama.cpp
tags:
- code
language:
- en
---

# Refact 1.6B FIM GGUF

## Introduction

The Refact 1.6B FIM GGUF model is a state-of-the-art AI-powered coding assistant developed by Small Magellanic Cloud AI Ltd. This model is designed to assist developers with code completion, refactoring, and chat-based interactions, excelling in code-related natural language understanding and generation tasks.

## Quantized Model Files

The model comes in various quantized versions to suit different computational needs:

- **refact-1.6B-fim-q4_0.gguf**: A 4-bit quantized model with a file size of 878 MB.
- **refact-1.6B-fim-q5_0.gguf**: A 5-bit quantized model with a file size of 1.1 GB.
- **refact-1.6B-fim-q8_0.gguf**: An 8-bit quantized model with a file size of 1.6 GB.

## Features and Usage

The model is versatile and can be employed for:

- Code completion
- Code refactoring
- Chat-based interactions

### Example Usage

Here's a sample shell command to invoke the model:

```sh
# Sample shell command to use the model
./main -m models/smallcloudai/Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiply two integers in python" --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0
```

## Performance Metrics

The model outperforms many existing models in both code completion and chat-based interactions, as evidenced by the HumanEval results.

| Model                | Size  | HumanEval pass@1 | HumanEval pass@10 |
|----------------------|-------|------------------|-------------------|
| **Refact-1.6-fim**   | 1.6b  | 32.0%            | 53.0%             |
| StableCode           | 3b    | 20.2%            | 33.8%             |
| ReplitCode v1        | 3b    | 21.9%            | N/A               |

## Installation and Setup

The model can be integrated into your IDE via the [Refact plugin](https://refact.ai/). For self-hosting, an [open-source Docker container](https://github.com/smallcloudai/refact) is available.

## Limitations and Bias

The model primarily focuses on English text, which may result in lower performance for non-English languages.

## Technical Specifications

- **Architecture**: LLAMA-like model with multi-query attention
- **Training Tokens**: 1.2T for pretraining, 40B for fine-tuning
- **Precision**: bfloat16
- **Training Time**: 28 days

## License

The model is licensed under the BigScience OpenRAIL-M v1 license agreement.

## Citation

If you use this model in your work, please cite it by linking back to the following page for proper attribution:

[Refact 1.6B FIM Model](https://huggingface.co/smallcloudai/Refact-1_6B-fim)

## Acknowledgments

Special thanks to [ds5t5](https://github.com/ggerganov/llama.cpp/pull/3329) for their contribution in implementing the source for converting the model's tensors from Hugging Face to GGUF format. Their work has been instrumental in enhancing the model's versatility.

### Example Command for Testing

To test the model against Hugging Face, you can use the following command:

```sh
# Example command for testing against Hugging Face
python3 convert-refact-hf-to-gguf.py ./Refact-1_6B-fim 1

./main -m ./Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiply two integers in python"  --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0
```

This resolves llama.cpp issue [#3061](https://github.com/ggerganov/llama.cpp/issues/3061).