|
--- |
|
base_model: |
|
- ArliAI/ArliAI-Llama-3-8B-Formax-v1.0 |
|
- Sao10K/L3.1-8B-Niitama-v1.1 |
|
- Sao10K/L3-8B-Tamamo-v1 |
|
- Sao10K/L3-8B-Stheno-v3.3-32K |
|
- Edgerunners/Lyraea-large-llama-3.1 |
|
- gradientai/Llama-3-8B-Instruct-Gradient-1048k |
|
library_name: transformers |
|
tags: |
|
- mergekit |
|
- merge |
|
--- |
|
Second (third) time's the charm. After fighting with Formax trying to increase it's max context to something that isn't 4k, spat out this merge as a result. Still maintains a |
|
lot of v0.1's properties; creativity, literacy, and chattiness. Knowing everything I've learned making this, time to dive headfirst into making an L3.1 space whale. |
|
|
|
I stg LLMs are testing me. |
|
|
|
### Quants |
|
|
|
[OG Q8 GGUF](https://huggingface.co/kromquant/L3.1-Siithamo-v0.2b-8B-Q8-GGUF) by me. |
|
|
|
[GGUFs](https://huggingface.co/mradermacher/L3.1-Siithamo-v0.2-8B-GGUF) by [mradermacher](https://huggingface.co/mradermacher) |
|
|
|
### Details & Recommended Settings |
|
|
|
Outputs a lot, pretty fucking chatty like Stheno. Pulls some chaotic creativity from Niitama but its mellowed out with Tamamo. Flowery dramatic writing at times. |
|
Starts repeating at basic settings around 8k but DRY eliminates it and can handle 32k context. Very good instructions following. |
|
|
|
Rec. Settings: |
|
``` |
|
Template: L3 |
|
Temperature: 1.4 |
|
Min P: 0.1 |
|
Repeat Penalty: 1.05 |
|
Repeat Penalty Tokens: 256 |
|
Dyn Temp: 0.9-1.05 at 0.1 |
|
Smooth Sampl: 0.18 |
|
``` |
|
|
|
### Models Merged & Merge Theory |
|
|
|
The following models were included in the merge: |
|
* [Edgerunners/Lyraea-large-llama-3.1](https://huggingface.co/Edgerunners/Lyraea-large-llama-3.1) |
|
* [Sao10K/L3-8B-Stheno-v3.3-32K](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K) |
|
* [Sao10K/L3.1-8B-Niitama-v1.1](https://huggingface.co/Sao10K/L3.1-8B-Niitama-v1.1) |
|
* [Sao10K/L3-8B-Tamamo-v1](https://huggingface.co/Sao10K/L3-8B-Tamamo-v1) |
|
* [ArliAI/ArliAI-Llama-3-8B-Formax-v1.0](https://huggingface.co/ArliAI/ArliAI-Llama-3-8B-Formax-v1.0) |
|
* [gradientai/Llama-3-8B-Instruct-Gradient-1048k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k) |
|
|
|
Compared to v0.1, the siithamol3.1 part stayed the same. To 'increase' the context of Formax, just chopped of the ladder half and replaced it with a ~1M context model and that |
|
seemed to do the trick (after doing a bunch of other shit, this was the simplest and easiest route). Then, changed from dare_linear to breadcrumbs for the final merge, gave a |
|
better output without the hassle. Again, TIES anything didn't work nearly as well. |
|
|
|
### Config |
|
|
|
```yaml |
|
slices: |
|
- sources: |
|
- layer_range: [0, 16] |
|
model: ArliAI/ArliAI-Llama-3-8B-Formax-v1.0 |
|
- sources: |
|
- layer_range: [16, 32] |
|
model: gradientai/Llama-3-8B-Instruct-Gradient-1048k |
|
parameters: |
|
int8_mask: true |
|
merge_method: passthrough |
|
dtype: float32 |
|
out_dtype: bfloat16 |
|
name: formax.ext |
|
--- |
|
models: |
|
- model: Sao10K/L3.1-8B-Niitama-v1.1 |
|
- model: Sao10K/L3-8B-Stheno-v3.3-32K |
|
- model: Sao10K/L3-8B-Tamamo-v1 |
|
base_model: Edgerunners/Lyraea-large-llama-3.1 |
|
parameters: |
|
normalize: false |
|
int8_mask: true |
|
merge_method: model_stock |
|
dtype: float32 |
|
out_dtype: bfloat16 |
|
name: siithamol3.1 |
|
--- |
|
models: |
|
- model: siitamol3.1 |
|
parameters: |
|
weight: [0.5, 0.8, 0.9, 1] |
|
density: 0.9 |
|
gamma: 0.01 |
|
- model: formax.ext |
|
parameters: |
|
weight: [0.5, 0.2, 0.1, 0] |
|
density: 0.9 |
|
gamma: 0.01 |
|
base_model: siitamol3.1 |
|
parameters: |
|
normalize: false |
|
int8_mask: true |
|
merge_method: breadcrumbs |
|
dtype: float32 |
|
out_dtype: bfloat16 |
|
``` |