metadata

language:
  - en
license: cc-by-nc-4.0
model_name: Octopus-V4-GGUF
base_model: NexaAIDev/Octopus-v4
inference: false
model_creator: NexaAIDev
quantized_by: Nexa AI, Inc.
tags:
  - function calling
  - on-device language model
  - gguf
  - llama cpp

Octopus V4-GGUF: Graph of language models

- Original Model - Nexa AI Website - Octopus-v4 Github - ArXiv - Domain LLM Leaderbaord

Acknowledgement:
We sincerely thank our community members, Mingyuan and Zoey, for their extraordinary contributions to this quantization effort. Please explore Octopus-v4 for our original huggingface model.

(Recommended) Run with llama.cpp

Clone and compile:

   git clone https://github.com/ggerganov/llama.cpp
   cd llama.cpp
   # Compile the source code:
   make

Prepare the Input Prompt File:

Navigate to the prompt folder inside the llama.cpp, and create a new file named chat-with-octopus.txt.

chat-with-octopus.txt:

   User:

Execute the Model:

Run the following command in the terminal:

   ./main -m ./path/to/octopus-v4-Q4_K_M.gguf -c 512 -b 2048 -n 256 -t 1 --repeat_penalty 1.0 --top_k 0 --top_p 1.0 --color -i -r "User:" -f prompts/chat-with-octopus.txt

Example prompt to interact

  <|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>

Run with Ollama

Create a Modelfile in your directory and include a FROM statement with the path to your local model:

FROM ./path/to/octopus-v4-Q4_K_M.gguf

Use the following command to add the model to Ollama:

ollama create octopus-v4-Q4_K_M -f Modelfile

Verify that the model has been successfully imported:

ollama ls

Run the model

ollama run octopus-v4-Q4_K_M "<|system|>You are a router. Below is the query from the users, please call the correct function and generate the parameters to call the function.<|end|><|user|>Tell me the result of derivative of x^3 when x is 2?<|end|><|assistant|>"

Dataset and Benchmark

Utilized questions from MMLU to evaluate the performances.
Evaluated with the Ollama llm-benchmark method.

Quantized GGUF Models

Name	Quant method	Bits	Size	Respons (token/second)	Use Cases
Octopus-v4.gguf			7.64 GB	27.64	extremely large
Octopus-v4-Q2_K.gguf	Q2_K	2	1.42 GB	54.20	extremely not recommended, high loss
Octopus-v4-Q3_K.gguf	Q3_K	3	1.96 GB	51.22	not recommended
Octopus-v4-Q3_K_S.gguf	Q3_K_S	3	1.68 GB	51.78	not very recommended
Octopus-v4-Q3_K_M.gguf	Q3_K_M	3	1.96 GB	50.86	not very recommended
Octopus-v4-Q3_K_L.gguf	Q3_K_L	3	2.09 GB	50.05	not very recommended
Octopus-v4-Q4_0.gguf	Q4_0	4	2.18 GB	65.76	good quality, recommended
Octopus-v4-Q4_1.gguf	Q4_1	4	2.41 GB	69.01	slow, good quality, recommended
Octopus-v4-Q4_K.gguf	Q4_K	4	2.39 GB	55.76	slow, good quality, recommended
Octopus-v4-Q4_K_S.gguf	Q4_K_S	4	2.19 GB	53.98	high quality, recommended
Octopus-v4-Q4_K_M.gguf	Q4_K_M	4	2.39 GB	58.39	some functions loss, not very recommended
Octopus-v4-Q5_0.gguf	Q5_0	5	2.64 GB	61.98	slow, good quality
Octopus-v4-Q5_1.gguf	Q5_1	5	2.87 GB	63.44	slow, good quality
Octopus-v4-Q5_K.gguf	Q5_K	5	2.82 GB	58.28	moderate speed, recommended
Octopus-v4-Q5_K_S.gguf	Q5_K_S	5	2.64 GB	59.95	moderate speed, recommended
Octopus-v4-Q5_K_M.gguf	Q5_K_M	5	2.82 GB	53.31	fast, good quality, recommended
Octopus-v4-Q6_K.gguf	Q6_K	6	3.14 GB	52.15	large, not very recommended
Octopus-v4-Q8_0.gguf	Q8_0	8	4.06 GB	50.10	very large, good quality
Octopus-v4-f16.gguf	f16	16	7.64 GB	30.61	extremely large

Quantized with llama.cpp