Benchmark-Results / README.md
FallenMerick's picture
Create README.md
86dcab1 verified
|
raw
history blame
No virus
775 Bytes
MODEL HellaSwag EQ_Bench % Parsed (EQ)
athirdpath/NSFW_DPO_vmgb-7b 85.36 74.83 100
Intel/neural-chat-7b-v3-1 79.76 62.26 100
jondurbin/airoboros-m-7b-3.1.2 81.34 38.52 100
jondurbin/cinematika-7b-v0.1 80.31 44.85 100
KoboldAI/Mistral-7B-Erebus-v3 76.65 18.19 97.66
KoboldAI/Mistral-7B-Holodeck-1 79.19 2.10 98.25
migtissera/Synthia-7B-v3.0 81.74 15.03 94.74
mlabonne/NeuralBeagle14-7B 86.46 74.21 99.42
NousResearch/Hermes-2-Pro-Mistral-7B 80.56 65.93 100
Open-Orca/Mistral-7B-OpenOrca 81.67 63.98 99.42
rwitz/go-bruins 84.92 73.62 100
SanjiWatsuki/Kunoichi-7B 85.25 72.36 100
senseable/WestLake-7B-v2 87.42 77.87 100
teknium/OpenHermes-2.5-Mistral-7B 81.68 65.75 100
Undi95/Toppy-M-7B 83.52 66.57 100