File size: 3,585 Bytes
d81b4ee 855ec8f d81b4ee |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
---
base_model:
- ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
- Sao10K/L3.1-8B-Niitama-v1.1
- Sao10K/L3-8B-Tamamo-v1
- Sao10K/L3-8B-Stheno-v3.3-32K
- Edgerunners/Lyraea-large-llama-3.1
- gradientai/Llama-3-8B-Instruct-Gradient-1048k
library_name: transformers
tags:
- mergekit
- merge
---
Second (third) time's the charm. After fighting with Formax trying to increase it's max context to something that isn't 4k, spat out this merge as a result. Still maintains a
lot of v0.1's properties; creativity, literacy, and chattiness. Knowing everything I've learned making this, time to dive headfirst into making an L3.1 space whale.
I stg LLMs are testing me.
### Quants
[OG Q8 GGUF](https://huggingface.co/kromquant/L3.1-Siithamo-v0.2b-8B-Q8-GGUF) by me.
[GGUFs](https://huggingface.co/mradermacher/L3.1-Siithamo-v0.2-8B-GGUF) by [mradermacher](https://huggingface.co/mradermacher)
### Details & Recommended Settings
(Still testing; details subject to change)
Outputs a lot, pretty chatty like Stheno. Pulls some chaotic creativity from Niitama but its mellowed out with Tamamo. A little cliche writing, but it's almost endearing in a way.
Sticks to instructs fairly well and changes to match {user}'s input in length and verbosity at times. Well balanced in all RP uses.
I've tested this model to get up to 8-9k without any repitition, but idk what the true context limit of this model is yet.
Rec. Settings:
```
Template: L3
Temperature: 1.4
Min P: 0.1
Repeat Penalty: 1.05
Repeat Penalty Tokens: 256
```
### Models Merged & Merge Theory
The following models were included in the merge:
* [Edgerunners/Lyraea-large-llama-3.1](https://huggingface.co/Edgerunners/Lyraea-large-llama-3.1)
* [Sao10K/L3-8B-Stheno-v3.3-32K](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K)
* [Sao10K/L3.1-8B-Niitama-v1.1](https://huggingface.co/Sao10K/L3.1-8B-Niitama-v1.1)
* [Sao10K/L3-8B-Tamamo-v1](https://huggingface.co/Sao10K/L3-8B-Tamamo-v1)
* [ArliAI/ArliAI-Llama-3-8B-Formax-v1.0](https://huggingface.co/ArliAI/ArliAI-Llama-3-8B-Formax-v1.0)
* [gradientai/Llama-3-8B-Instruct-Gradient-1048k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k)
Compared to v0.1, the siithamol3.1 part stayed the same. To 'increase' the context of Formax, just chopped of the ladder half and replaced it with a ~1M context model and that
seemed to do the trick (after doing a bunch of other shit, this was the simplest and easiest route). Then, changed from dare_linear to breadcrumbs for the final merge, gave a
better output without the hassle. Again, TIES anything didn't work nearly as well.
### Config
```yaml
slices:
- sources:
- layer_range: [0, 16]
model: ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
- sources:
- layer_range: [16, 32]
model: gradientai/Llama-3-8B-Instruct-Gradient-1048k
parameters:
int8_mask: true
merge_method: passthrough
dtype: float32
out_dtype: bfloat16
name: formax.ext
---
models:
- model: Sao10K/L3.1-8B-Niitama-v1.1
- model: Sao10K/L3-8B-Stheno-v3.3-32K
- model: Sao10K/L3-8B-Tamamo-v1
base_model: Edgerunners/Lyraea-large-llama-3.1
parameters:
normalize: false
int8_mask: true
merge_method: model_stock
dtype: float32
out_dtype: bfloat16
name: siithamol3.1
---
models:
- model: siitamol3.1
parameters:
weight: [0.5, 0.8, 0.9, 1]
density: 0.9
gamma: 0.01
- model: formax.ext
parameters:
weight: [0.5, 0.2, 0.1, 0]
density: 0.9
gamma: 0.01
base_model: siitamol3.1
parameters:
normalize: false
int8_mask: true
merge_method: breadcrumbs
dtype: float32
out_dtype: bfloat16
``` |