kromeurus commited on
Commit
d81b4ee
1 Parent(s): f139134

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -47
README.md CHANGED
@@ -1,47 +1,103 @@
1
- ---
2
- base_model: []
3
- library_name: transformers
4
- tags:
5
- - mergekit
6
- - merge
7
-
8
- ---
9
- # merge
10
-
11
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
-
13
- ## Merge Details
14
- ### Merge Method
15
-
16
- This model was merged using the breadcrumbs merge method using parts/siitamol3.1 as a base.
17
-
18
- ### Models Merged
19
-
20
- The following models were included in the merge:
21
- * parts/formax.ext
22
-
23
- ### Configuration
24
-
25
- The following YAML configuration was used to produce this model:
26
-
27
- ```yaml
28
- models:
29
- - model: parts/siitamol3.1
30
- parameters:
31
- weight: [0.5, 0.8, 0.9, 1]
32
- density: 0.9
33
- gamma: 0.01
34
- - model: parts/formax.ext
35
- parameters:
36
- weight: [0.5, 0.2, 0.1, 0]
37
- density: 0.9
38
- gamma: 0.01
39
- base_model: parts/siitamol3.1
40
- parameters:
41
- normalize: false
42
- int8_mask: true
43
- merge_method: breadcrumbs
44
- dtype: float32
45
- out_dtype: bfloat16
46
-
47
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
4
+ - Sao10K/L3.1-8B-Niitama-v1.1
5
+ - Sao10K/L3-8B-Tamamo-v1
6
+ - Sao10K/L3-8B-Stheno-v3.3-32K
7
+ - Edgerunners/Lyraea-large-llama-3.1
8
+ - gradientai/Llama-3-8B-Instruct-Gradient-1048k
9
+ library_name: transformers
10
+ tags:
11
+ - mergekit
12
+ - merge
13
+ ---
14
+ Second (third) time's the charm. After fighting with Formax trying to increase it's max context to something that isn't 4k, spat out this merge as a result. Still maintains a
15
+ lot of v0.1's properties; creativity, literacy, and chattiness. Knowing everything I've learned making this, time to dive headfirst into making an L3.1 space whale.
16
+
17
+ I stg LLMs are testing me.
18
+
19
+ ### Quants
20
+
21
+ [OG Q8 GGUF](https://huggingface.co/kromquant/L3.1-Siithamo-v0.2b-8B-Q8-GGUF) by me.
22
+
23
+ ### Details & Recommended Settings
24
+
25
+ (Still testing; details subject to change)
26
+
27
+ Outputs a lot, pretty chatty like Stheno. Pulls some chaotic creativity from Niitama but its mellowed out with Tamamo. A little cliche writing, but it's almost endearing in a way.
28
+ Sticks to instructs fairly well and changes to match {user}'s input in length and verbosity at times. Well balanced in all RP uses.
29
+
30
+ I've tested this model to get up to 8-9k without any repitition, but idk what the true context limit of this model is yet.
31
+
32
+ Rec. Settings:
33
+ ```
34
+ Template: L3
35
+ Temperature: 1.4
36
+ Min P: 0.1
37
+ Repeat Penalty: 1.05
38
+ Repeat Penalty Tokens: 256
39
+ ```
40
+
41
+ ### Models Merged & Merge Theory
42
+
43
+ The following models were included in the merge:
44
+ * [Edgerunners/Lyraea-large-llama-3.1](https://huggingface.co/Edgerunners/Lyraea-large-llama-3.1)
45
+ * [Sao10K/L3-8B-Stheno-v3.3-32K](https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K)
46
+ * [Sao10K/L3.1-8B-Niitama-v1.1](https://huggingface.co/Sao10K/L3.1-8B-Niitama-v1.1)
47
+ * [Sao10K/L3-8B-Tamamo-v1](https://huggingface.co/Sao10K/L3-8B-Tamamo-v1)
48
+ * [ArliAI/ArliAI-Llama-3-8B-Formax-v1.0](https://huggingface.co/ArliAI/ArliAI-Llama-3-8B-Formax-v1.0)
49
+ * [gradientai/Llama-3-8B-Instruct-Gradient-1048k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k)
50
+
51
+ Compared to v0.1, the siithamol3.1 part stayed the same. To 'increase' the context of Formax, just chopped of the ladder half and replaced it with a ~1M context model and that
52
+ seemed to do the trick (after doing a bunch of other shit, this was the simplest and easiest route). Then, changed from dare_linear to breadcrumbs for the final merge, gave a
53
+ better output without the hassle. Again, TIES anything didn't work nearly as well.
54
+
55
+ ### Config
56
+
57
+ ```yaml
58
+ slices:
59
+ - sources:
60
+ - layer_range: [0, 16]
61
+ model: ArliAI/ArliAI-Llama-3-8B-Formax-v1.0
62
+ - sources:
63
+ - layer_range: [16, 32]
64
+ model: gradientai/Llama-3-8B-Instruct-Gradient-1048k
65
+ parameters:
66
+ int8_mask: true
67
+ merge_method: passthrough
68
+ dtype: float32
69
+ out_dtype: bfloat16
70
+ name: formax.ext
71
+ ---
72
+ models:
73
+ - model: Sao10K/L3.1-8B-Niitama-v1.1
74
+ - model: Sao10K/L3-8B-Stheno-v3.3-32K
75
+ - model: Sao10K/L3-8B-Tamamo-v1
76
+ base_model: Edgerunners/Lyraea-large-llama-3.1
77
+ parameters:
78
+ normalize: false
79
+ int8_mask: true
80
+ merge_method: model_stock
81
+ dtype: float32
82
+ out_dtype: bfloat16
83
+ name: siithamol3.1
84
+ ---
85
+ models:
86
+ - model: siitamol3.1
87
+ parameters:
88
+ weight: [0.5, 0.8, 0.9, 1]
89
+ density: 0.9
90
+ gamma: 0.01
91
+ - model: formax.ext
92
+ parameters:
93
+ weight: [0.5, 0.2, 0.1, 0]
94
+ density: 0.9
95
+ gamma: 0.01
96
+ base_model: siitamol3.1
97
+ parameters:
98
+ normalize: false
99
+ int8_mask: true
100
+ merge_method: breadcrumbs
101
+ dtype: float32
102
+ out_dtype: bfloat16
103
+ ```