Triangle104
/

MS-Meadowlark-22B-Q4_K_M-GGUF

Inference Endpoints

Model card Files Files and versions Community

MS-Meadowlark-22B-Q4_K_M-GGUF / README.md

Triangle104's picture

Update README.md

54ac645 verified 13 days ago

|

history blame contribute delete

3.48 kB

	---
	base_model: allura-org/MS-Meadowlark-22B
	library_name: transformers
	tags:
	- mergekit
	- merge
	- llama-cpp
	- gguf-my-repo
	license: other
	license_name: mrl
	license_link: https://mistral.ai/licenses/MRL-0.1.md
	---

	# Triangle104/MS-Meadowlark-22B-Q4_K_M-GGUF
	This model was converted to GGUF format from [`allura-org/MS-Meadowlark-22B`](https://huggingface.co/allura-org/MS-Meadowlark-22B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
	Refer to the [original model card](https://huggingface.co/allura-org/MS-Meadowlark-22B) for more details on the model.

	Model details:
	-
	A roleplay and storywriting model based on Mistral Small 22B.

	GGUF models: https://huggingface.co/mradermacher/MS-Meadowlark-22B-GGUF/

	EXL2 models: https://huggingface.co/CalamitousFelicitousness/MS-Meadowlark-22B-exl2

	Datasets used in this model:

	Dampfinchen/Creative_Writing_Multiturn at 16k
	Fizzarolli/rosier-dataset + Alfitaria/body-inflation-org at 16k
	ToastyPigeon/SpringDragon at 8k

	Each dataset was trained separately onto Mistral Small Instruct, and then the component models were merged along with nbeerbower/Mistral-Small-Gutenberg-Doppel-22B to create Meadowlark.

	I tried different blends of the component models, and this one seems to be the most stable while retaining creativity and unpredictability added by the trained data.
	Instruct Format

	Rosier/bodyinf and SpringDragon were trained in completion format. This model should work with Kobold Lite in Adventure Mode and Story Mode.

	Creative_Writing_Multiturn and Gutenberg-Doppel were trained using the official instruct format of Mistral Small Instruct:

	<s>[INST] {User message}[/INST] {Assistant response}</s>

	This is the Mistral Small V2&V3 preset in SillyTavern and Kobold Lite.

	For SillyTavern in particular I've had better luck getting good output from Mistral Small using a custom instruct template that formats the assembled context as a single user turn. This prevents SillyTavern from confusing the model by assembling user/assistant turns in a nonstandard way. Note: This preset is not compatible with Stepped Thinking, use the Mistral V2&V3 preset for that.

	---
	## Use with llama.cpp
	Install llama.cpp through brew (works on Mac and Linux)

	```bash
	brew install llama.cpp

	```
	Invoke the llama.cpp server or the CLI.

	### CLI:
	```bash
	llama-cli --hf-repo Triangle104/MS-Meadowlark-22B-Q4_K_M-GGUF --hf-file ms-meadowlark-22b-q4_k_m.gguf -p "The meaning to life and the universe is"
	```

	### Server:
	```bash
	llama-server --hf-repo Triangle104/MS-Meadowlark-22B-Q4_K_M-GGUF --hf-file ms-meadowlark-22b-q4_k_m.gguf -c 2048
	```

	Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

	Step 1: Clone llama.cpp from GitHub.
	```
	git clone https://github.com/ggerganov/llama.cpp
	```

	Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
	```
	cd llama.cpp && LLAMA_CURL=1 make
	```

	Step 3: Run inference through the main binary.
	```
	./llama-cli --hf-repo Triangle104/MS-Meadowlark-22B-Q4_K_M-GGUF --hf-file ms-meadowlark-22b-q4_k_m.gguf -p "The meaning to life and the universe is"
	```
	or
	```
	./llama-server --hf-repo Triangle104/MS-Meadowlark-22B-Q4_K_M-GGUF --hf-file ms-meadowlark-22b-q4_k_m.gguf -c 2048
	```