File size: 2,008 Bytes
5579bc1 2fbaeb7 5579bc1 21cbf05 5579bc1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
---
license: cc-by-nc-4.0
tags:
- merge
---
## Description
This repo contains bf16 files of Nyxene-11B. Like [OmniMix](https://huggingface.co/Undi95/Mistral-11B-OmniMix) but with new models.
## Model used
- [berkeley-nest/Starling-LM-7B-alpha](https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha)
- [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B)
- [fblgit/juanako-7b-UNA](https://huggingface.co/fblgit/juanako-7b-UNA)
- [ehartford/dolphin-2.1-mistral-7b](https://huggingface.co/ehartford/dolphin-2.1-mistral-7b)
## Prompt template
The best one after further testing is this one:
```
<|system|>
Below is an instruction that describes a task. Write a response that appropriately completes the request.
<|user|>
{prompt}
<|assistant|>
```
## The secret sauce
dolphin-juanako-11B :
```
slices:
- sources:
- model: fblgit/juanako-7b-UNA
layer_range: [0, 24]
- sources:
- model: ehartford/dolphin-2.1-mistral-7b
layer_range: [8, 32]
merge_method: passthrough
dtype: bfloat16
```
Starling-NeuralHermes-11B :
```
slices:
- sources:
- model: berkeley-nest/Starling-LM-7B-alpha
layer_range: [0, 24]
- sources:
- model: mlabonne/NeuralHermes-2.5-Mistral-7B
layer_range: [8, 32]
merge_method: passthrough
dtype: bfloat16
```
Nyxene-11B :
```
slices:
- sources:
- model: dolphin-juanako-11B
layer_range: [0, 48]
- model: Starling-NeuralHermes-11B
layer_range: [0, 48]
merge_method: slerp
base_model: dolphin-juanako-11B
parameters:
t:
- filter: lm_head
value: [0.75]
- filter: embed_tokens
value: [0.75]
- filter: self_attn
value: [0.75, 0.25]
- filter: mlp
value: [0.25, 0.75]
- filter: layernorm
value: [0.5, 0.5]
- filter: modelnorm
value: [0.75]
- value: 0.5 # fallback for rest of tensors
dtype: bfloat16
```
I use [mergekit](https://github.com/cg123/mergekit) for all the manipulation told here.
|