gbueno86 commited on
Commit
3c77eb1
1 Parent(s): 4edcb2d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -20
README.md CHANGED
@@ -1,10 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  **Testing**
2
  =====================
3
 
4
  **Hyperparameters**
5
  ---------------
6
 
7
- * **Temperature**: 0.9
8
  * **Penalize repeat sequence**: 1.05
9
  * **Consider N tokens for penalize**: 256
10
  * **Penalize repetition of newlines**
@@ -16,30 +54,80 @@
16
  ------------------
17
 
18
  * b3527-2-g2d5dd7bb
 
19
 
20
- **File**
21
  ------------------
22
 
23
  * Cathallama-70B.Q4_0.gguf
 
 
 
24
 
25
- **Test Cases**
26
  --------------
27
 
28
- | Test Case | Result |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  | --- | --- |
30
- | Ball on cup | OK |
31
- | Door window combination | OK |
32
- | Big duck small horse | OK |
33
- | JSON | OK |
34
- | Killers | OK |
35
- | Dragon | OK |
36
- | Poem | OK |
37
- | Jane faster | OK |
38
- | Shirts | OK |
39
- | Sisters | OK |
40
- | Python snake game | OK* |
41
- | Story | OK |
42
-
43
- *best I ever saw on local LLMs including Qwen2 72b at 8bpw, Llama 3 70b 8bpw
44
-
45
- Note: See sample generations on the main folder of the repo.
 
1
+ ---
2
+ license: llama3.1
3
+ language:
4
+ - en
5
+ ---
6
+
7
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/649dc85249ae3a68334adcc6/KxaiZ7rDKkYlix99O9j5H.png)
8
+
9
+ **Cathallama**
10
+ =====================================
11
+
12
+ Awesome model, my new daily driver.
13
+
14
+ **Notable Performance**
15
+
16
+ * 2% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b
17
+ * Strong performance in MMLU-PRO categories overall
18
+ * Great performance during manual testing
19
+
20
+ **Creation workflow**
21
+ =====================
22
+ **Models merged**
23
+ * meta-llama/Meta-Llama-3.1-70B-Instruct
24
+ * turboderp/Cat-Llama-3-70B-instruct
25
+ * Nexusflow/Athene-70B
26
+
27
+ ```
28
+ flowchart TD
29
+ A[Nexusflow_Athene] -->|Merge with| B[Meta-Llama-3.1]
30
+ C[turboderp_Cat] -->|Merge with| D[Meta-Llama-3.1]
31
+ B -->| | E[Merge]
32
+ D -->| | E[Merge]
33
+ E[Merge] -->|Result| F[Cathallama]
34
+ ```
35
+
36
+
37
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/649dc85249ae3a68334adcc6/bBcB194tAtsZjPUnI1pDQ.png)
38
+
39
  **Testing**
40
  =====================
41
 
42
  **Hyperparameters**
43
  ---------------
44
 
45
+ * **Temperature**: 0.0 for automated, 0.9 for manual
46
  * **Penalize repeat sequence**: 1.05
47
  * **Consider N tokens for penalize**: 256
48
  * **Penalize repetition of newlines**
 
54
  ------------------
55
 
56
  * b3527-2-g2d5dd7bb
57
+ * -fa -ngl -1 -ctk f16 --no-mmap
58
 
59
+ **Tested Files**
60
  ------------------
61
 
62
  * Cathallama-70B.Q4_0.gguf
63
+ * Nexusflow_Athene-70B.Q4_0.gguf
64
+ * turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf
65
+ * Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
66
 
67
+ **Tests**
68
  --------------
69
 
70
+
71
+ **Manual testing**
72
+
73
+ | Category | Test Case | Cathallama-70B.Q4_0.gguf | Nexusflow_Athene-70B.Q4_0.gguf | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf |
74
+ | --- | --- | --- | --- | --- | --- |
75
+ | **Common Sense** | Ball on cup | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | OK |
76
+ | | Big duck small horse | <span style="color: red;">KO</span> | OK | <span style="color: red;">KO</span> | OK |
77
+ | | Killers | OK | OK | <span style="color: red;">KO</span> | OK |
78
+ | | Strawberry r's | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
79
+ | | 9.11 or 9.9 bigger | <span style="color: red;">KO</span> | OK | OK | <span style="color: red;">KO</span> |
80
+ | | Dragon or lens | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
81
+ | | Shirts | OK | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
82
+ | | Sisters | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
83
+ | | Jane faster | OK | OK | OK | OK |
84
+ | **Programming** | JSON | OK | OK | OK | OK |
85
+ | | Python snake game | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
86
+ | **Math** | Door window combination | OK | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
87
+ | **Smoke** | Poem | OK | OK | OK | OK |
88
+ | | Story | OK | OK | KO | OK |
89
+
90
+ *Note: See sample_generations.txt on the main folder of the repo for the raw generations.*
91
+
92
+ **MMLU-PRO**
93
+
94
+ | Model | Success % |
95
+ | --- | --- |
96
+ | Cathallama-70B | **51.0%** |
97
+ | turboderp_Cat-Llama-3-70B-instruct | 37.0% |
98
+ | Nexusflow_Athene-70B | 41.0% |
99
+ | Meta-Llama-3.1-70B-Instruct | 42.0% |
100
+
101
+ | MMLU-PRO category| Cathallama-70B.Q4_0.gguf | Nexusflow_Athene-70B.Q4_0.gguf | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf |
102
+ | --- | --- | --- | --- | --- |
103
+ | Business | **50.0%** | 45.0% | 20.0% | 40.0% |
104
+ | Law | **40.0%** | 30.0% | 30.0% | 35.0% |
105
+ | Psychology | **85.0%** | 80.0% | 70.0% | 75.0% |
106
+ | Biology | 80.0% | 70.0% | **85.0%** | 80.0% |
107
+ | Chemistry | **55.0%** | 40.0% | 35.0% | 35.0% |
108
+ | History | **65.0%** | 60.0% | 55.0% | **65.0%** |
109
+ | Other | **55.0%** | 50.0% | 45.0% | 50.0% |
110
+ | Health | **75.0%** | 40.0% | 60.0% | 65.0% |
111
+ | Economics | **80.0%** | 75.0% | 65.0% | 70.0% |
112
+ | Math | **45.0%** | 35.0% | 15.0% | 40.0% |
113
+ | Physics | **50.0%** | 45.0% | 45.0% | 45.0% |
114
+ | Computer Science | **60.0%** | 55.0% | 55.0% | **60.0%** |
115
+ | Philosophy | 55.0% | **60.0%** | 45.0% | 50.0% |
116
+ | Engineering | 35.0% | **40.0%** | 25.0% | 35.0% |
117
+
118
+ *Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.*
119
+
120
+ **PubmedQA**
121
+
122
+ Model Name | Success% |
123
  | --- | --- |
124
+ | Cathallama-70B.Q4_0.gguf| 73.00% |
125
+ | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | **76.00%** |
126
+ | Nexusflow_Athene-70B.Q4_0.gguf | 67.00% |
127
+ | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf | 72.00% |
128
+
129
+ **Request**
130
+ --------------
131
+ Please make GGUFs for this. I can't now for reasons.
132
+
133
+ And if you are hiring in the EU or can sponsor a visa, PM me :D