library: llama.cpp
library_link: https://github.com/ggerganov/llama.cpp
base_model:
- smallcloudai/Refact-1_6B-fim
pipeline_tag: text-generation
inference: true
widget:
- text: 'def print_hello_world():'
example_title: Hello world
group: Python
license: bigscience-openrail-m
pretrain-datasets:
- books
- arxiv
- c4
- falcon-refinedweb
- wiki
- github-issues
- stack_markdown
- self-made dataset of permissive github code
datasets:
- bigcode/the-stack-dedup
- rombodawg/2XUNCENSORED_MegaCodeTraining188k
- bigcode/commitpackft
language:
- en
tags:
- nlp
- code
- gguf
Refact 1.6B FIM GGUF
Introduction
The Refact 1.6B FIM GGUF model is a state-of-the-art AI-powered coding assistant developed by Small Magellanic Cloud AI Ltd. This model is designed to assist developers with code completion, refactoring, and chat-based interactions, excelling in code-related natural language understanding and generation tasks.
Quantized Model Files
The model comes in various quantized versions to suit different computational needs:
- refact-1.6B-fim-q8_0.gguf: A 8-bit quantized model with a file size of 1.69 GB.
- refact-1.6B-fim-f16.gguf: A half precision model with a file size of 3.17 GB.
Features and Usage
The model is versatile and can be employed for:
- Code completion
- Code refactoring
- Chat-based interactions
Example Usage
Here's a sample shell command to invoke the model:
# Sample shell command to use the model
./main -m models/smallcloudai/Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiply two integers in python" --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0
Performance Metrics
The model outperforms many existing models in both code completion and chat-based interactions, as evidenced by the HumanEval results.
Model | Size | HumanEval pass@1 | HumanEval pass@10 |
---|---|---|---|
Refact-1.6-fim | 1.6b | 32.0% | 53.0% |
StableCode | 3b | 20.2% | 33.8% |
ReplitCode v1 | 3b | 21.9% | N/A |
Installation and Setup
The model can be integrated into your IDE via the Refact plugin. For self-hosting, an open-source Docker container is available.
Limitations and Bias
The model primarily focuses on English text, which may result in lower performance for non-English languages.
Technical Specifications
- Architecture: LLAMA-like model with multi-query attention
- Training Tokens: 1.2T for pretraining, 40B for fine-tuning
- Precision: bfloat16
- Training Time: 28 days
License
The model is licensed under the BigScience OpenRAIL-M v1 license agreement.
Citation
If you use this model in your work, please cite it by linking back to the following page for proper attribution:
Acknowledgments
Special thanks to ds5t5 for their contribution in implementing the source for converting the model's tensors from Hugging Face to GGUF format. Their work has been instrumental in enhancing the model's versatility.
Example Command for Testing
To test the model against Hugging Face, you can use the following command:
# Example command for testing against Hugging Face
python convert-hf-to-gguf.py models/smallcloudai/Refact-1_6B-fim
./main --color -e -s 1 -c 256 -n 256 -m ./models/smallcloudai/Refact-1_6B-fim/ggml-model-f16.gguf -p "def multiply(a: int, b: int) -> int:"
This resolves llama.cpp issue #3061.