NexaAIDev
/

octo-net-gguf

function calling

on-device language model

Model card Files Files and versions Community

octo-net-gguf / README.md

Zack Zhiyuan Li

add README

c103f91 6 months ago

|

3.82 kB

	---
	language:
	- en
	license: apache-2.0
	model_name: Octopus-V2-2B
	base_model: NexaAIDev/Octopus-v4
	inference: false
	model_creator: NexaAIDev
	quantized_by: Second State Inc.
	tags:
	- function calling
	- on-device language model
	- android
	---

	# Octopus-v4-GGUF

	## Original Model

	[NexaAIDev/Octopus-v4](https://huggingface.co/NexaAIDev/Octopus-v4)

	## Run with [Ollama](https://github.com/ollama/ollama)

	```bash
	ollama run NexaAIDev/octopus-v4-q4_k_m
	```

	Input example:

	```json
	Query: Tell me the result of derivative of x^3 when x is 2?

	# <nexa_4> represents the math gpt.
	Response: <nexa_4> ('Determine the derivative of the function f(x) = x^3 at the point where x equals 2, and interpret the result within the context of rate of change and tangent slope.')<nexa_end>

	```

	### Dataset and Benchmark

	* Utilized questions from [MMLU](https://github.com/hendrycks/test) to evaluate the performances.
	* Evaluated with the Ollama [llm-benchmark](https://github.com/MinhNgyuen/llm-benchmark) method.


	## Quantized GGUF Models

	\| Name \| Quant method \| Bits \| Size \| Respons (token/second) \| Use Cases \|
	\| ---------------------- \| ------------ \| ---- \| ------- \| ---------------------- \| ----------------------------------------- \|
	\| Octopus-v4.gguf \| \| \| 7.20 GB \| 27.64 \| extremely large \|
	\| Octopus-v4-Q2_K.gguf \| Q2_K \| 2 \| 1.32 GB \| 54.20 \| extremely not recommended, high loss \|
	\| Octopus-v4-Q3_K.gguf \| Q3_K \| 3 \| 1.82 GB \| 51.22 \| not recommended \|
	\| Octopus-v4-Q3_K_S.gguf \| Q3_K_S \| 3 \| 1.57 GB \| 51.78 \| not very recommended \|
	\| Octopus-v4-Q3_K_M.gguf \| Q3_K_M \| 3 \| 1.82 GB \| 50.86 \| not very recommended \|
	\| Octopus-v4-Q3_K_L.gguf \| Q3_K_L \| 3 \| 1.94 GB \| 50.05 \| not very recommended \|
	\| Octopus-v4-Q4_0.gguf \| Q4_0 \| 4 \| 2.03 GB \| 65.76 \| good quality, recommended \|
	\| Octopus-v4-Q4_1.gguf \| Q4_1 \| 4 \| 2.24 GB \| 69.01 \| slow, good quality, recommended \|
	\| Octopus-v4-Q4_K.gguf \| Q4_K \| 4 \| 2.23 GB \| 55.76 \| slow, good quality, recommended \|
	\| Octopus-v4-Q4_K_S.gguf \| Q4_K_S \| 4 \| 2.04 GB \| 53.98 \| high quality, recommended \|
	\| Octopus-v4-Q4_K_M.gguf \| Q4_K_M \| 4 \| 1.51 GB \| 58.39 \| some functions loss, not very recommended \|
	\| Octopus-v4-Q5_0.gguf \| Q5_0 \| 5 \| 2.45 GB \| 61.98 \| slow, good quality \|
	\| Octopus-v4-Q5_1.gguf \| Q5_1 \| 5 \| 2.67 GB \| 63.44 \| slow, good quality \|
	\| Octopus-v4-Q5_K.gguf \| Q5_K \| 5 \| 2.58 GB \| 58.28 \| moderate speed, recommended \|
	\| Octopus-v4-Q5_K_S.gguf \| Q5_K_S \| 5 \| 2.45 GB \| 59.95 \| moderate speed, recommended \|
	\| Octopus-v4-Q5_K_M.gguf \| Q5_K_M \| 5 \| 2.62 GB \| 53.31 \| fast, good quality, recommended \|
	\| Octopus-v4-Q6_K.gguf \| Q6_K \| 6 \| 2.91 GB \| 52.15 \| large, not very recommended \|
	\| Octopus-v4-Q8_0.gguf \| Q8_0 \| 8 \| 3.78 GB \| 50.10 \| very large, good quality \|
	\| Octopus-v4-f16.gguf \| f16 \| 16 \| 7.20 GB \| 30.61 \| extremely large \|

	_Quantized with llama.cpp_