leaderboard-pr-bot commited on
Commit
25bd7ee
1 Parent(s): f0076c4

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +117 -1
README.md CHANGED
@@ -3,6 +3,109 @@ license: apache-2.0
3
  tags:
4
  - llama-2
5
  - roleplaying
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ---
7
  This 13B model, TimeCrystal-l2-13B is built to maximize logic and instruct following, whilst also increasing the vividness of prose found in Chronos based models like Mythomax, over the more romantic prose, hopefully without losing the elegent narrative structure touch of newer models like synthia and xwin. TLDR: Attempt at more clever, better prose.
8
 
@@ -23,4 +126,17 @@ TimeStone + OpenStone (0.9,0,0) = TimeCrystal
23
 
24
  Props to all the mergers, fine tuners!
25
 
26
- All models in Merge: Many, lol.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  tags:
4
  - llama-2
5
  - roleplaying
6
+ model-index:
7
+ - name: TimeCrystal-l2-13B
8
+ results:
9
+ - task:
10
+ type: text-generation
11
+ name: Text Generation
12
+ dataset:
13
+ name: AI2 Reasoning Challenge (25-Shot)
14
+ type: ai2_arc
15
+ config: ARC-Challenge
16
+ split: test
17
+ args:
18
+ num_few_shot: 25
19
+ metrics:
20
+ - type: acc_norm
21
+ value: 61.18
22
+ name: normalized accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BlueNipples/TimeCrystal-l2-13B
25
+ name: Open LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: HellaSwag (10-Shot)
31
+ type: hellaswag
32
+ split: validation
33
+ args:
34
+ num_few_shot: 10
35
+ metrics:
36
+ - type: acc_norm
37
+ value: 83.71
38
+ name: normalized accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BlueNipples/TimeCrystal-l2-13B
41
+ name: Open LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: MMLU (5-Shot)
47
+ type: cais/mmlu
48
+ config: all
49
+ split: test
50
+ args:
51
+ num_few_shot: 5
52
+ metrics:
53
+ - type: acc
54
+ value: 56.46
55
+ name: accuracy
56
+ source:
57
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BlueNipples/TimeCrystal-l2-13B
58
+ name: Open LLM Leaderboard
59
+ - task:
60
+ type: text-generation
61
+ name: Text Generation
62
+ dataset:
63
+ name: TruthfulQA (0-shot)
64
+ type: truthful_qa
65
+ config: multiple_choice
66
+ split: validation
67
+ args:
68
+ num_few_shot: 0
69
+ metrics:
70
+ - type: mc2
71
+ value: 51.3
72
+ source:
73
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BlueNipples/TimeCrystal-l2-13B
74
+ name: Open LLM Leaderboard
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: Winogrande (5-shot)
80
+ type: winogrande
81
+ config: winogrande_xl
82
+ split: validation
83
+ args:
84
+ num_few_shot: 5
85
+ metrics:
86
+ - type: acc
87
+ value: 75.37
88
+ name: accuracy
89
+ source:
90
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BlueNipples/TimeCrystal-l2-13B
91
+ name: Open LLM Leaderboard
92
+ - task:
93
+ type: text-generation
94
+ name: Text Generation
95
+ dataset:
96
+ name: GSM8k (5-shot)
97
+ type: gsm8k
98
+ config: main
99
+ split: test
100
+ args:
101
+ num_few_shot: 5
102
+ metrics:
103
+ - type: acc
104
+ value: 27.52
105
+ name: accuracy
106
+ source:
107
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BlueNipples/TimeCrystal-l2-13B
108
+ name: Open LLM Leaderboard
109
  ---
110
  This 13B model, TimeCrystal-l2-13B is built to maximize logic and instruct following, whilst also increasing the vividness of prose found in Chronos based models like Mythomax, over the more romantic prose, hopefully without losing the elegent narrative structure touch of newer models like synthia and xwin. TLDR: Attempt at more clever, better prose.
111
 
 
126
 
127
  Props to all the mergers, fine tuners!
128
 
129
+ All models in Merge: Many, lol.
130
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
131
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BlueNipples__TimeCrystal-l2-13B)
132
+
133
+ | Metric |Value|
134
+ |---------------------------------|----:|
135
+ |Avg. |59.26|
136
+ |AI2 Reasoning Challenge (25-Shot)|61.18|
137
+ |HellaSwag (10-Shot) |83.71|
138
+ |MMLU (5-Shot) |56.46|
139
+ |TruthfulQA (0-shot) |51.30|
140
+ |Winogrande (5-shot) |75.37|
141
+ |GSM8k (5-shot) |27.52|
142
+