--- base_model: - ArliAI/ArliAI-Llama-3-8B-Formax-v1.0 - Sao10K/L3.1-8B-Niitama-v1.1 - Sao10K/L3-8B-Tamamo-v1 - Sao10K/L3-8B-Stheno-v3.3-32K - Edgerunners/Lyraea-large-llama-3.1 - gradientai/Llama-3-8B-Instruct-Gradient-1048k library_name: transformers tags: - mergekit - merge --- Second (third) time's the charm. After fighting with Formax trying to increase it's max context to something that isn't 4k, spat out this merge as a result. Still maintains a lot of v0.1's properties; creativity, literacy, and chattiness. Knowing everything I've learned making this, time to dive headfirst into making an L3.1 space whale. I stg LLMs are testing me. ### Quants [OG Q8 GGUF](https://huggingface.co/kromquant/L3.1-Siithamo-v0.2b-8B-Q8-GGUF) by me. ### Details & Recommended Settings (Still testing; details subject to change) Outputs a lot, pretty chatty like Stheno. Pulls some chaotic creativity from Niitama but its mellowed out with Tamamo. A little cliche writing, but it's almost endearing in a way. Sticks to instructs fairly well and changes to match {user}'s input in length and verbosity at times. Well balanced in all RP uses. I've tested this model to get up to 8-9k without any repitition, but idk what the true context limit of this model is yet. Rec. Settings: ``` Template: L3 Temperature: 1.4 Min P: 0.1 Repeat Penalty: 1.05 Repeat Penalty Tokens: 256 ``` ### Models Merged & Merge Theory The following models were included in the merge: * [Edgerunners/Lyraea-large-llama-3.1](https://huggingface.co/Edgerunners/Lyraea-large-llama-3.1) * [Sao10K/L3-8B-Stheno-v3.3-32K](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K) * [Sao10K/L3.1-8B-Niitama-v1.1](https://huggingface.co/Sao10K/L3.1-8B-Niitama-v1.1) * [Sao10K/L3-8B-Tamamo-v1](https://huggingface.co/Sao10K/L3-8B-Tamamo-v1) * [ArliAI/ArliAI-Llama-3-8B-Formax-v1.0](https://huggingface.co/ArliAI/ArliAI-Llama-3-8B-Formax-v1.0) * [gradientai/Llama-3-8B-Instruct-Gradient-1048k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k) Compared to v0.1, the siithamol3.1 part stayed the same. To 'increase' the context of Formax, just chopped of the ladder half and replaced it with a ~1M context model and that seemed to do the trick (after doing a bunch of other shit, this was the simplest and easiest route). Then, changed from dare_linear to breadcrumbs for the final merge, gave a better output without the hassle. Again, TIES anything didn't work nearly as well. ### Config ```yaml slices: - sources: - layer_range: [0, 16] model: ArliAI/ArliAI-Llama-3-8B-Formax-v1.0 - sources: - layer_range: [16, 32] model: gradientai/Llama-3-8B-Instruct-Gradient-1048k parameters: int8_mask: true merge_method: passthrough dtype: float32 out_dtype: bfloat16 name: formax.ext --- models: - model: Sao10K/L3.1-8B-Niitama-v1.1 - model: Sao10K/L3-8B-Stheno-v3.3-32K - model: Sao10K/L3-8B-Tamamo-v1 base_model: Edgerunners/Lyraea-large-llama-3.1 parameters: normalize: false int8_mask: true merge_method: model_stock dtype: float32 out_dtype: bfloat16 name: siithamol3.1 --- models: - model: siitamol3.1 parameters: weight: [0.5, 0.8, 0.9, 1] density: 0.9 gamma: 0.01 - model: formax.ext parameters: weight: [0.5, 0.2, 0.1, 0] density: 0.9 gamma: 0.01 base_model: siitamol3.1 parameters: normalize: false int8_mask: true merge_method: breadcrumbs dtype: float32 out_dtype: bfloat16 ```