Text Generation
GGUF
English
nlp
code
Inference Endpoints
refact-1.6b-fim / README.md
aberrio's picture
docs: Update Refact 1.6B FIM GGUF Documentation
750e02a
|
raw
history blame
3.8 kB
metadata
pipeline_tag: text-generation
inference: true
widget:
  - text: 'def print_hello_world():'
    example_title: Hello world
    group: Python
license: bigscience-openrail-m
pretrain-datasets:
  - books
  - arxiv
  - c4
  - falcon-refinedweb
  - wiki
  - github-issues
  - stack_markdown
  - self-made dataset of permissive github code
datasets:
  - bigcode/the-stack-dedup
  - rombodawg/2XUNCENSORED_MegaCodeTraining188k
  - bigcode/commitpackft
library_name: llama.cpp
tags:
  - code
language:
  - en

Refact 1.6B FIM GGUF

Introduction

The Refact 1.6B FIM GGUF model is a state-of-the-art AI-powered coding assistant developed by Small Magellanic Cloud AI Ltd. This model is designed to assist developers with code completion, refactoring, and chat-based interactions, excelling in code-related natural language understanding and generation tasks.

Quantized Model Files

The model comes in various quantized versions to suit different computational needs:

  • refact-1.6B-fim-q4_0.gguf: A 4-bit quantized model with a file size of 878 MB.
  • refact-1.6B-fim-q5_0.gguf: A 5-bit quantized model with a file size of 1.1 GB.
  • refact-1.6B-fim-q8_0.gguf: An 8-bit quantized model with a file size of 1.6 GB.

Features and Usage

The model is versatile and can be employed for:

  • Code completion
  • Code refactoring
  • Chat-based interactions

Example Usage

Here's a sample shell command to invoke the model:

# Sample shell command to use the model
./main -m models/smallcloudai/Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiply two integers in python" --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0

Performance Metrics

The model outperforms many existing models in both code completion and chat-based interactions, as evidenced by the HumanEval results.

Model Size HumanEval pass@1 HumanEval pass@10
Refact-1.6-fim 1.6b 32.0% 53.0%
StableCode 3b 20.2% 33.8%
ReplitCode v1 3b 21.9% N/A

Installation and Setup

The model can be integrated into your IDE via the Refact plugin. For self-hosting, an open-source Docker container is available.

Limitations and Bias

The model primarily focuses on English text, which may result in lower performance for non-English languages.

Technical Specifications

  • Architecture: LLAMA-like model with multi-query attention
  • Training Tokens: 1.2T for pretraining, 40B for fine-tuning
  • Precision: bfloat16
  • Training Time: 28 days

License

The model is licensed under the BigScience OpenRAIL-M v1 license agreement.

Citation

If you use this model in your work, please cite it by linking back to the following page for proper attribution:

Refact 1.6B FIM Model

Acknowledgments

Special thanks to ds5t5 for their contribution in implementing the source for converting the model's tensors from Hugging Face to GGUF format. Their work has been instrumental in enhancing the model's versatility.

Example Command for Testing

To test the model against Hugging Face, you can use the following command:

# Example command for testing against Hugging Face
python3 convert-refact-hf-to-gguf.py ./Refact-1_6B-fim 1

./main -m ./Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiply two integers in python"  --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0

This resolves llama.cpp issue #3061.