|
--- |
|
license: apache-2.0 |
|
base_model: BEE-spoke-data/smol_llama-220M-GQA |
|
datasets: |
|
- teknium/openhermes |
|
inference: |
|
parameters: |
|
do_sample: true |
|
renormalize_logits: true |
|
temperature: 0.25 |
|
top_p: 0.95 |
|
top_k: 50 |
|
min_new_tokens: 2 |
|
max_new_tokens: 96 |
|
repetition_penalty: 1.03 |
|
no_repeat_ngram_size: 5 |
|
epsilon_cutoff: 0.0008 |
|
widget: |
|
- text: "Below is an instruction that describes a task, paired with an input that\ |
|
\ provides further context. Write a response that appropriately completes the\ |
|
\ request. \n \n### Instruction: \n \nWrite an ode to Chipotle burritos.\ |
|
\ \n \n### Response: \n" |
|
example_title: burritos |
|
model-index: |
|
- name: smol_llama-220M-openhermes |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: AI2 Reasoning Challenge (25-Shot) |
|
type: ai2_arc |
|
config: ARC-Challenge |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: acc_norm |
|
value: 25.17 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: HellaSwag (10-Shot) |
|
type: hellaswag |
|
split: validation |
|
args: |
|
num_few_shot: 10 |
|
metrics: |
|
- type: acc_norm |
|
value: 28.98 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU (5-Shot) |
|
type: cais/mmlu |
|
config: all |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 26.17 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: TruthfulQA (0-shot) |
|
type: truthful_qa |
|
config: multiple_choice |
|
split: validation |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: mc2 |
|
value: 43.08 |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: Winogrande (5-shot) |
|
type: winogrande |
|
config: winogrande_xl |
|
split: validation |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 52.01 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GSM8k (5-shot) |
|
type: gsm8k |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 0.61 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: HuggingFaceH4/ifeval |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 15.55 |
|
name: strict accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: BBH |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 3.11 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: hendrycks/competition_math |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 0.0 |
|
name: exact match |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 2.35 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 6.22 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 1.34 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-openhermes |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
|
|
# BEE-spoke-data/smol_llama-220M-openhermes |
|
|
|
> Please note that this is an experiment, and the model has limitations because it is smol. |
|
|
|
|
|
prompt format is alpaca |
|
|
|
|
|
``` |
|
Below is an instruction that describes a task, paired with an input that |
|
provides further context. Write a response that appropriately completes |
|
the request. |
|
|
|
### Instruction: |
|
|
|
How can I increase my meme production/output? Currently, I only create them in ancient babylonian which is time consuming. |
|
|
|
### Inputs: |
|
|
|
### Response: |
|
``` |
|
|
|
It was trained on inputs so if you have inputs (like some text to ask a question about) then include it under `### Inputs:` |
|
|
|
|
|
## Example |
|
|
|
Output on the text above ^. The inference API is set to sample with low temp so you should see (_at least slightly_) different generations each time. |
|
|
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/0nFP2jsBkritnryKmI8NV.png) |
|
|
|
Note that the inference API parameters used here are an initial educated guess, and may be updated over time: |
|
|
|
```yml |
|
inference: |
|
parameters: |
|
do_sample: true |
|
renormalize_logits: true |
|
temperature: 0.25 |
|
top_p: 0.95 |
|
top_k: 50 |
|
min_new_tokens: 2 |
|
max_new_tokens: 96 |
|
repetition_penalty: 1.03 |
|
no_repeat_ngram_size: 5 |
|
epsilon_cutoff: 0.0008 |
|
``` |
|
|
|
Feel free to experiment with the parameters using the model in Python and let us know if you have improved results with other params! |
|
|
|
## Data |
|
|
|
Note that **this checkpoint** was fine-tuned on `teknium/openhermes`, which is generated/synthetic data by an OpenAI model. This means usage of this checkpoint should follow their terms of use: https://openai.com/policies/terms-of-use |
|
|
|
|
|
--- |
|
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BEE-spoke-data__smol_llama-220M-openhermes) |
|
|
|
| Metric |Value| |
|
|---------------------------------|----:| |
|
|Avg. |29.34| |
|
|AI2 Reasoning Challenge (25-Shot)|25.17| |
|
|HellaSwag (10-Shot) |28.98| |
|
|MMLU (5-Shot) |26.17| |
|
|TruthfulQA (0-shot) |43.08| |
|
|Winogrande (5-shot) |52.01| |
|
|GSM8k (5-shot) | 0.61| |
|
|
|
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_BEE-spoke-data__smol_llama-220M-openhermes) |
|
|
|
| Metric |Value| |
|
|-------------------|----:| |
|
|Avg. | 4.76| |
|
|IFEval (0-Shot) |15.55| |
|
|BBH (3-Shot) | 3.11| |
|
|MATH Lvl 5 (4-Shot)| 0.00| |
|
|GPQA (0-shot) | 2.35| |
|
|MuSR (0-shot) | 6.22| |
|
|MMLU-PRO (5-shot) | 1.34| |
|
|
|
|