refact-1.6b-fim / README.md

docs: Update Refact 1.6B FIM GGUF Documentation

750e02a about 1 year ago

3.8 kB

	---
	pipeline_tag: text-generation
	inference: true
	widget:
	- text: 'def print_hello_world():'
	example_title: Hello world
	group: Python
	license: bigscience-openrail-m
	pretrain-datasets:
	- books
	- arxiv
	- c4
	- falcon-refinedweb
	- wiki
	- github-issues
	- stack_markdown
	- self-made dataset of permissive github code
	datasets:
	- bigcode/the-stack-dedup
	- rombodawg/2XUNCENSORED_MegaCodeTraining188k
	- bigcode/commitpackft
	library_name: llama.cpp
	tags:
	- code
	language:
	- en
	---

	# Refact 1.6B FIM GGUF

	## Introduction

	The Refact 1.6B FIM GGUF model is a state-of-the-art AI-powered coding assistant developed by Small Magellanic Cloud AI Ltd. This model is designed to assist developers with code completion, refactoring, and chat-based interactions, excelling in code-related natural language understanding and generation tasks.

	## Quantized Model Files

	The model comes in various quantized versions to suit different computational needs:

	- refact-1.6B-fim-q4_0.gguf: A 4-bit quantized model with a file size of 878 MB.
	- refact-1.6B-fim-q5_0.gguf: A 5-bit quantized model with a file size of 1.1 GB.
	- refact-1.6B-fim-q8_0.gguf: An 8-bit quantized model with a file size of 1.6 GB.

	## Features and Usage

	The model is versatile and can be employed for:

	- Code completion
	- Code refactoring
	- Chat-based interactions

	### Example Usage

	Here's a sample shell command to invoke the model:

	```sh
	# Sample shell command to use the model
	./main -m models/smallcloudai/Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiply two integers in python" --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0
	```

	## Performance Metrics

	The model outperforms many existing models in both code completion and chat-based interactions, as evidenced by the HumanEval results.

	\| Model \| Size \| HumanEval pass@1 \| HumanEval pass@10 \|
	\|----------------------\|-------\|------------------\|-------------------\|
	\| Refact-1.6-fim \| 1.6b \| 32.0% \| 53.0% \|
	\| StableCode \| 3b \| 20.2% \| 33.8% \|
	\| ReplitCode v1 \| 3b \| 21.9% \| N/A \|

	## Installation and Setup

	The model can be integrated into your IDE via the [Refact plugin](https://refact.ai/). For self-hosting, an [open-source Docker container](https://github.com/smallcloudai/refact) is available.

	## Limitations and Bias

	The model primarily focuses on English text, which may result in lower performance for non-English languages.

	## Technical Specifications

	- Architecture: LLAMA-like model with multi-query attention
	- Training Tokens: 1.2T for pretraining, 40B for fine-tuning
	- Precision: bfloat16
	- Training Time: 28 days

	## License

	The model is licensed under the BigScience OpenRAIL-M v1 license agreement.

	## Citation

	If you use this model in your work, please cite it by linking back to the following page for proper attribution:

	[Refact 1.6B FIM Model](https://huggingface.co/smallcloudai/Refact-1_6B-fim)

	## Acknowledgments

	Special thanks to [ds5t5](https://github.com/ggerganov/llama.cpp/pull/3329) for their contribution in implementing the source for converting the model's tensors from Hugging Face to GGUF format. Their work has been instrumental in enhancing the model's versatility.

	### Example Command for Testing

	To test the model against Hugging Face, you can use the following command:

	```sh
	# Example command for testing against Hugging Face
	python3 convert-refact-hf-to-gguf.py ./Refact-1_6B-fim 1

	./main -m ./Refact-1_6B-fim/ggml-model-f16.gguf -n 300 -p "write a function to multiply two integers in python" --temp 1.0 --top-p 1.0 --top-k 1 --repeat_penalty 1.0
	```

	This resolves llama.cpp issue [#3061](https://github.com/ggerganov/llama.cpp/issues/3061).