Edit model card

I made a really stupid mistake and uploaded two models instead of one. I uploaded the files for both and was going to decide which one to release today, but I got up at 4-5am, immediately got on my PC, and then just set both to public after writing more of the model card. Hopefully no one downloaded it, but if you did, then I'm sorry for the inconvenience.

Llama-3-8B-Stroganoff-4.0

Since V3, I tested a lot of old models, looked at some new ones, and used every merge method available in mergekit. This one is from experiments I was doing on model order, which is why all the models use the same parameters, but it was good enough that I decided to upload it. If you've been doing merges yourself, then most or all of the following information will be redundant, but some of it was not at all apparent to me, so I hope it will help others looking for more information.

Ties is not better than Task-Arithmetic, and Task-Arithmetic is not better than Ties; they both have certain advantages that make them better in different situations. Ties aims to reduce model interference by keeping weights that agree with each other and zeroing out the rest. If you try to use Ties with a bunch of models that do different things, then some aspects of the models might get erased if it doesn't have a strong enough presence. The order of the models does not matter with a Ties merge because all of the merging happens in one step, and changing the model order will produce identical hashes, assuming you're not using Dare or Della, which adds randomness to the merge.

Task-Arithmetic is a linear merge that first subtracts the base model from the fine-tuned models and then merges them in pairs starting at the top of the list before finally merging the result back on top of the base model. The order of the models does matter with a Task-Arithmetic merge, and changing the model order will produce different hashes. A Task-Arithmetic merge keeps more of the individuality of the component models, with the last to be merged having the strongest effect on the resulting model. Task-Arithmetic can be unpredictable at times, as changing the order of the models can produce significantly different results, but it can be effective at combining the strengths of different models once you find the right order.

Dare, Della, and Breadcrumbs are all enhancements to Ties and Task-Arithmetic that aim to improve the resulting merge by zeroing out certain weights. While they all remove weights before merging takes place, they each do it a bit differently. Dare assigns a flat dropout rate, meaning all weights have an equal chance of being dropped; Della scales the dropout rate based on the magnitude of change from the base model, with the largest changes having the smallest dropout rate; and Breadcrumbs first removes any outliers and then begins zeroing out weights until it reaches the target density, starting with the smallest changes. I've done direct comparisons between Dare and Della with all the same parameters, and Della has consistently outperformed Dare. I haven't tested breadcrumbs much, but the idea behind it seems solid.

Quantization Formats

GGUF

Details

  • License: llama3
  • Instruct Format: llama-3 or ChatML
  • Context Size: 8K

Models Used

Merge Config

merge_method: della_linear
dtype: bfloat16
parameters:
  normalize: true
  int8_mask: true
tokenizer_source: union
base_model: SicariusSicariiStuff/Dusk_Rainbow
models:
    - model: ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
      parameters:
        density: 0.55
        weight: 1
    - model: Sao10K/L3-8B-Stheno-v3.2
      parameters:
        density: 0.55
        weight: 1
    - model: Nitral-AI/Hathor_Sofit-L3-8B-v1
      parameters:
        density: 0.55
        weight: 1
    - model: TheDrummer/Llama-3SOME-8B-v2
      parameters:
        density: 0.55
        weight: 1
    - model: hf-100/Llama-3-Spellbound-Instruct-8B-0.3
      parameters:
        density: 0.55
        weight: 1
Downloads last month
199
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for HiroseKoichi/Llama-3-8B-Stroganoff-4.0