kromvault
/

L3.1-Siithamo-v0.1-8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

L3.1-Siithamo-v0.1-8B / README.md

kromeurus's picture

Update README.md

2ae9c3b verified 3 months ago

|

2.7 kB

	---
	base_model:
	- ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
	- Sao10K/L3.1-8B-Niitama-v1.1
	- Sao10K/L3-8B-Tamamo-v1
	- Sao10K/L3-8B-Stheno-v3.3-32K
	- Edgerunners/Lyraea-large-llama-3.1
	library_name: transformers
	tags:
	- mergekit
	- merge

	---
	My first foray into Llama 3.1 and just having fun with the merging process. Testing theories and such.

	Updated version with higher context [here](https://huggingface.co/kromeurus/L3.1-Siithamo-v0.2-8B).

	### Quants

	[OG Q8 GGUF](https://huggingface.co/kromquant/L3.1-Siithamo-v0.1-8B-Q8-GGUF) by me.

	### Details & Recommended Settings

	Unfortunaely, this model still double lines but its not as often. Dramatic as fuck at times. I haven't tested the context limit yet but I'm sure it suffered somehow.

	Outputs a lot, pretty chatty like Stheno. Pulls some chaotic creativity from Niitama but its mellowed out with Tamamo. A little cliche writing, but it's almost endearing in a way.
	Should follow instructs fine? Stunted a little compared to the original model, don't think that's a negative though.

	4K Max context even on L3.1 (DAMN U FORMAX)

	Rec. Settings:
	```
	Template: L3
	Temperature: 1.35
	Min P: 0.1
	Repeat Penalty: 1.05
	Repeat Penalty Tokens: 256
	```

	### Models Merged & Merge Theory

	The following models were included in the merge:
	* [Edgerunners/Lyraea-large-llama-3.1](https://huggingface.co/Edgerunners/Lyraea-large-llama-3.1)
	* [Sao10K/L3-8B-Stheno-v3.3-32K](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K)
	* [Sao10K/L3.1-8B-Niitama-v1.1](https://huggingface.co/Sao10K/L3.1-8B-Niitama-v1.1)
	* [Sao10K/L3-8B-Tamamo-v1](https://huggingface.co/Sao10K/L3-8B-Tamamo-v1)
	* [ArliAI/ArliAI-Llama-3-8B-Formax-v1.0](https://huggingface.co/ArliAI/ArliAI-Llama-3-8B-Formax-v1.0)

	Using Edgerunners Lyraea as the 3.1 base, model stock mereged L3.1 Niitama, Stheno 3.3, and Tamamo a top each other. Then trying to curb L3 tendencies and add some instruct following
	capabilities, added some Formax in a dare linear merge. At least for updating L3 to L3.1, doing TIES anything results in a 'shittier' model.

	### Config

	```yaml
	models:
	- model: Sao10K/L3.1-8B-Niitama-v1.1
	- model: Sao10K/L3-8B-Stheno-v3.3-32K
	- model: Sao10K/L3-8B-Tamamo-v1
	base_model: Edgerunners/Lyraea-large-llama-3.1
	parameters:
	normalize: false
	int8_mask: true
	merge_method: model_stock
	dtype: float32
	out_dtype: bfloat16
	name: siitamol3.1
	---
	models:
	- model: ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
	parameters:
	weight: [0.5, 0.3, 0.2, 0.1]
	- model: siitamol3.1
	parameters:
	weight: [0.5, 0.7, 0.8, 1]
	base_model: siitamol3.1
	parameters:
	normalize: false
	int8_mask: true
	merge_method: dare_linear
	dtype: float32
	out_dtype: bfloat16
	```