gbueno86
/

Cathallama-70B

@@ -1,10 +1,48 @@
 **Testing**
 =====================
 **Hyperparameters**
 ---------------
-* **Temperature**: 0.9
 * **Penalize repeat sequence**: 1.05
 * **Consider N tokens for penalize**: 256
 * **Penalize repetition of newlines**
@@ -16,30 +54,80 @@
 ------------------
 * b3527-2-g2d5dd7bb
-**File**
 ------------------
 * Cathallama-70B.Q4_0.gguf
-**Test Cases**
 --------------
-| Test Case | Result |
 | --- | --- |
-| Ball on cup | OK |
-| Door window combination | OK |
-| Big duck small horse | OK |
-| JSON | OK |
-| Killers | OK |
-| Dragon | OK |
-| Poem | OK |
-| Jane faster | OK |
-| Shirts | OK |
-| Sisters | OK |
-| Python snake game | OK* |
-| Story | OK |
-*best I ever saw on local LLMs including Qwen2 72b at 8bpw, Llama 3 70b 8bpw
-Note: See sample generations on the main folder of the repo.

+---
+license: llama3.1
+language:
+- en
+---
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/649dc85249ae3a68334adcc6/KxaiZ7rDKkYlix99O9j5H.png)
+**Cathallama**
+=====================================
+Awesome model, my new daily driver.
+**Notable Performance**
+* 2% overall success rate increase on MMLU-PRO over LLaMA 3.1 70b
+* Strong performance in MMLU-PRO categories overall
+* Great performance during manual testing
+**Creation workflow**
+=====================
+**Models merged**
+* meta-llama/Meta-Llama-3.1-70B-Instruct
+* turboderp/Cat-Llama-3-70B-instruct
+* Nexusflow/Athene-70B
+```
+flowchart TD
+    A[Nexusflow_Athene] -->|Merge with| B[Meta-Llama-3.1]
+    C[turboderp_Cat] -->|Merge with| D[Meta-Llama-3.1]
+    B -->| | E[Merge]
+    D -->| | E[Merge]
+    E[Merge] -->|Result| F[Cathallama]
+```
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/649dc85249ae3a68334adcc6/bBcB194tAtsZjPUnI1pDQ.png)
 **Testing**
 =====================
 **Hyperparameters**
 ---------------
+* **Temperature**: 0.0 for automated, 0.9 for manual
 * **Penalize repeat sequence**: 1.05
 * **Consider N tokens for penalize**: 256
 * **Penalize repetition of newlines**
 ------------------
 * b3527-2-g2d5dd7bb
+* -fa -ngl -1 -ctk f16 --no-mmap
+**Tested Files**
 ------------------
 * Cathallama-70B.Q4_0.gguf
+* Nexusflow_Athene-70B.Q4_0.gguf
+* turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf
+* Meta-Llama-3.1-70B-Instruct.Q4_0.gguf
+**Tests**
 --------------
+**Manual testing**
+| Category | Test Case | Cathallama-70B.Q4_0.gguf | Nexusflow_Athene-70B.Q4_0.gguf | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf |
+| --- | --- | --- | --- | --- | --- |
+| **Common Sense** | Ball on cup | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | OK |
+|  | Big duck small horse | <span style="color: red;">KO</span> | OK | <span style="color: red;">KO</span> | OK |
+|  | Killers | OK | OK | <span style="color: red;">KO</span> | OK |
+|  | Strawberry r's | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
+|  | 9.11 or 9.9 bigger | <span style="color: red;">KO</span> | OK | OK | <span style="color: red;">KO</span> |
+|  | Dragon or lens | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
+|  | Shirts | OK | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
+|  | Sisters | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
+|  | Jane faster | OK | OK | OK | OK |
+| **Programming** | JSON | OK | OK | OK | OK |
+|  | Python snake game | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
+| **Math** | Door window combination | OK | OK | <span style="color: red;">KO</span> | <span style="color: red;">KO</span> |
+| **Smoke** | Poem | OK | OK | OK | OK |
+|  | Story | OK | OK | KO | OK |
+*Note: See sample_generations.txt on the main folder of the repo for the raw generations.*
+**MMLU-PRO**
+| Model | Success % |
+| --- | --- |
+| Cathallama-70B | **51.0%** |
+| turboderp_Cat-Llama-3-70B-instruct | 37.0% |
+| Nexusflow_Athene-70B | 41.0% |
+| Meta-Llama-3.1-70B-Instruct | 42.0% |
+| MMLU-PRO category| Cathallama-70B.Q4_0.gguf | Nexusflow_Athene-70B.Q4_0.gguf | turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | Meta-Llama-3.1-70B-Instruct.Q4_0.gguf |
+| --- | --- | --- | --- | --- |
+| Business | **50.0%** | 45.0% | 20.0% | 40.0% |
+| Law | **40.0%** | 30.0% | 30.0% | 35.0% |
+| Psychology | **85.0%** | 80.0% | 70.0% | 75.0% |
+| Biology | 80.0% | 70.0% | **85.0%** | 80.0% |
+| Chemistry | **55.0%** | 40.0% | 35.0% | 35.0% |
+| History | **65.0%** | 60.0% | 55.0% | **65.0%** |
+| Other | **55.0%** | 50.0% | 45.0% | 50.0% |
+| Health | **75.0%** | 40.0% | 60.0% | 65.0% |
+| Economics | **80.0%** | 75.0% | 65.0% | 70.0% |
+| Math | **45.0%** | 35.0% | 15.0% | 40.0% |
+| Physics | **50.0%** | 45.0% | 45.0% | 45.0% |
+| Computer Science | **60.0%** | 55.0% | 55.0% | **60.0%** |
+| Philosophy | 55.0% | **60.0%** | 45.0% | 50.0% |
+| Engineering | 35.0% | **40.0%** | 25.0% | 35.0% |
+*Note: MMLU-PRO Overall tested with 100 questions. Categories testes with 20 questions from each category.*
+**PubmedQA**
+ Model Name | Success% |
 | --- | --- |
+| Cathallama-70B.Q4_0.gguf| 73.00% |
+| turboderp_Cat-Llama-3-70B-instruct.Q4_0.gguf | **76.00%** |
+| Nexusflow_Athene-70B.Q4_0.gguf | 67.00% |
+| Meta-Llama-3.1-70B-Instruct.Q4_0.gguf | 72.00% |
+**Request**
+--------------
+Please make GGUFs for this. I can't now for reasons.
+And if you are hiring in the EU or can sponsor a visa, PM me :D