kromeurus's picture
Update README.md
855ec8f verified
|
raw
history blame
3.59 kB
metadata
base_model:
  - ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
  - Sao10K/L3.1-8B-Niitama-v1.1
  - Sao10K/L3-8B-Tamamo-v1
  - Sao10K/L3-8B-Stheno-v3.3-32K
  - Edgerunners/Lyraea-large-llama-3.1
  - gradientai/Llama-3-8B-Instruct-Gradient-1048k
library_name: transformers
tags:
  - mergekit
  - merge

Second (third) time's the charm. After fighting with Formax trying to increase it's max context to something that isn't 4k, spat out this merge as a result. Still maintains a lot of v0.1's properties; creativity, literacy, and chattiness. Knowing everything I've learned making this, time to dive headfirst into making an L3.1 space whale.

I stg LLMs are testing me.

Quants

OG Q8 GGUF by me.

GGUFs by mradermacher

Details & Recommended Settings

(Still testing; details subject to change)

Outputs a lot, pretty chatty like Stheno. Pulls some chaotic creativity from Niitama but its mellowed out with Tamamo. A little cliche writing, but it's almost endearing in a way. Sticks to instructs fairly well and changes to match {user}'s input in length and verbosity at times. Well balanced in all RP uses.

I've tested this model to get up to 8-9k without any repitition, but idk what the true context limit of this model is yet.

Rec. Settings:

Template: L3
Temperature: 1.4
Min P: 0.1
Repeat Penalty: 1.05
Repeat Penalty Tokens: 256

Models Merged & Merge Theory

The following models were included in the merge:

Compared to v0.1, the siithamol3.1 part stayed the same. To 'increase' the context of Formax, just chopped of the ladder half and replaced it with a ~1M context model and that seemed to do the trick (after doing a bunch of other shit, this was the simplest and easiest route). Then, changed from dare_linear to breadcrumbs for the final merge, gave a better output without the hassle. Again, TIES anything didn't work nearly as well.

Config

slices:
- sources:
  - layer_range: [0, 16]
    model: ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
- sources:
  - layer_range: [16, 32]
    model: gradientai/Llama-3-8B-Instruct-Gradient-1048k
parameters:
  int8_mask: true
merge_method: passthrough
dtype: float32
out_dtype: bfloat16
name: formax.ext
---
models:
    - model: Sao10K/L3.1-8B-Niitama-v1.1
    - model: Sao10K/L3-8B-Stheno-v3.3-32K
    - model: Sao10K/L3-8B-Tamamo-v1
base_model: Edgerunners/Lyraea-large-llama-3.1
parameters:
  normalize: false
  int8_mask: true
merge_method: model_stock
dtype: float32
out_dtype: bfloat16
name: siithamol3.1
---
models: 
  - model: siitamol3.1
    parameters:
      weight: [0.5, 0.8, 0.9, 1]
      density: 0.9
      gamma: 0.01
  - model: formax.ext
    parameters:
      weight: [0.5, 0.2, 0.1, 0]
      density: 0.9
      gamma: 0.01
base_model: siitamol3.1
parameters:
  normalize: false
  int8_mask: true
merge_method: breadcrumbs
dtype: float32
out_dtype: bfloat16