kromvault
/

L3.1-Siithamo-v0.2-8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

L3.1-Siithamo-v0.2-8B / README.md

kromeurus's picture

Update README.md

7cb6cd6 verified 3 months ago

|

3.44 kB

	---
	base_model:
	- ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
	- Sao10K/L3.1-8B-Niitama-v1.1
	- Sao10K/L3-8B-Tamamo-v1
	- Sao10K/L3-8B-Stheno-v3.3-32K
	- Edgerunners/Lyraea-large-llama-3.1
	- gradientai/Llama-3-8B-Instruct-Gradient-1048k
	library_name: transformers
	tags:
	- mergekit
	- merge
	---
	Second (third) time's the charm. After fighting with Formax trying to increase it's max context to something that isn't 4k, spat out this merge as a result. Still maintains a
	lot of v0.1's properties; creativity, literacy, and chattiness. Knowing everything I've learned making this, time to dive headfirst into making an L3.1 space whale.

	I stg LLMs are testing me.

	### Quants

	[OG Q8 GGUF](https://huggingface.co/kromquant/L3.1-Siithamo-v0.2b-8B-Q8-GGUF) by me.

	[GGUFs](https://huggingface.co/mradermacher/L3.1-Siithamo-v0.2-8B-GGUF) by [mradermacher](https://huggingface.co/mradermacher)

	### Details & Recommended Settings

	Outputs a lot, pretty fucking chatty like Stheno. Pulls some chaotic creativity from Niitama but its mellowed out with Tamamo. Flowery dramatic writing at times.
	Starts repeating at basic settings around 8k but DRY eliminates it and can handle 32k context. Very good instructions following.

	Rec. Settings:
	```
	Template: L3
	Temperature: 1.4
	Min P: 0.1
	Repeat Penalty: 1.05
	Repeat Penalty Tokens: 256
	Dyn Temp: 0.9-1.05 at 0.1
	Smooth Sampl: 0.18
	```

	### Models Merged & Merge Theory

	The following models were included in the merge:
	* [Edgerunners/Lyraea-large-llama-3.1](https://huggingface.co/Edgerunners/Lyraea-large-llama-3.1)
	* [Sao10K/L3-8B-Stheno-v3.3-32K](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K)
	* [Sao10K/L3.1-8B-Niitama-v1.1](https://huggingface.co/Sao10K/L3.1-8B-Niitama-v1.1)
	* [Sao10K/L3-8B-Tamamo-v1](https://huggingface.co/Sao10K/L3-8B-Tamamo-v1)
	* [ArliAI/ArliAI-Llama-3-8B-Formax-v1.0](https://huggingface.co/ArliAI/ArliAI-Llama-3-8B-Formax-v1.0)
	* [gradientai/Llama-3-8B-Instruct-Gradient-1048k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k)

	Compared to v0.1, the siithamol3.1 part stayed the same. To 'increase' the context of Formax, just chopped of the ladder half and replaced it with a ~1M context model and that
	seemed to do the trick (after doing a bunch of other shit, this was the simplest and easiest route). Then, changed from dare_linear to breadcrumbs for the final merge, gave a
	better output without the hassle. Again, TIES anything didn't work nearly as well.

	### Config

	```yaml
	slices:
	- sources:
	- layer_range: [0, 16]
	model: ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
	- sources:
	- layer_range: [16, 32]
	model: gradientai/Llama-3-8B-Instruct-Gradient-1048k
	parameters:
	int8_mask: true
	merge_method: passthrough
	dtype: float32
	out_dtype: bfloat16
	name: formax.ext
	---
	models:
	- model: Sao10K/L3.1-8B-Niitama-v1.1
	- model: Sao10K/L3-8B-Stheno-v3.3-32K
	- model: Sao10K/L3-8B-Tamamo-v1
	base_model: Edgerunners/Lyraea-large-llama-3.1
	parameters:
	normalize: false
	int8_mask: true
	merge_method: model_stock
	dtype: float32
	out_dtype: bfloat16
	name: siithamol3.1
	---
	models:
	- model: siitamol3.1
	parameters:
	weight: [0.5, 0.8, 0.9, 1]
	density: 0.9
	gamma: 0.01
	- model: formax.ext
	parameters:
	weight: [0.5, 0.2, 0.1, 0]
	density: 0.9
	gamma: 0.01
	base_model: siitamol3.1
	parameters:
	normalize: false
	int8_mask: true
	merge_method: breadcrumbs
	dtype: float32
	out_dtype: bfloat16
	```