EstopianMaid-13B / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
4119ac0 verified
|
raw
history blame
4.41 kB
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- roleplay
- text-generation-inference
model-index:
- name: EstopianMaid-13B
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 60.49
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KatyTheCutie/EstopianMaid-13B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 83.49
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KatyTheCutie/EstopianMaid-13B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 56.18
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KatyTheCutie/EstopianMaid-13B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 52.35
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KatyTheCutie/EstopianMaid-13B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 75.53
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KatyTheCutie/EstopianMaid-13B
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 9.17
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=KatyTheCutie/EstopianMaid-13B
name: Open LLM Leaderboard
---
![image/png](https://cdn-uploads.huggingface.co/production/uploads/653a2392341143f7774424d8/fyK_RtEjb9sLF_Mq0nZm2.png)
Based on feedback Estopian made can:
- EstopianMaid is good at sticking to the character card.
- maintains coherency in a setting with multiple characters.
- Able to create new scenario's
- Feature from Thespis:
![image/webp](https://cdn-uploads.huggingface.co/production/uploads/653a2392341143f7774424d8/1Z4P7XshVOW8fLg9pey4H.webp)
- Prompt Template: Alpaca
### Instruction:
{prompt}
### Response:
Recommended settings:
- SillyTavern Default Preset.
- Temperature: 0.7
- Min-P: 0.3
- Amount to Gen: 256
- Top P: 1
- Repetition penalty: 1.10
Models used:
BlueNipples/TimeCrystal-l2-13B
cgato/Thespis-13b-DPO-v0.7
KoboldAI/LLaMA2-13B-Estopia
NeverSleep/Noromaid-13B-0.4-DPO
Doctor-Shotgun/cat-v1.0-13b
Feedback is always appreciated!
Thank you KoboldAI for their usage of their MergeBox and Caitlyn G. for their support and feedback.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_KatyTheCutie__EstopianMaid-13B)
| Metric |Value|
|---------------------------------|----:|
|Avg. |56.20|
|AI2 Reasoning Challenge (25-Shot)|60.49|
|HellaSwag (10-Shot) |83.49|
|MMLU (5-Shot) |56.18|
|TruthfulQA (0-shot) |52.35|
|Winogrande (5-shot) |75.53|
|GSM8k (5-shot) | 9.17|