Update README.md
Browse files
README.md
CHANGED
@@ -48,7 +48,7 @@ Dataset of highly filtered and curated question and answer pairs. Release TBD.
|
|
48 |
`lilloukas/Platypus-30B` was instruction fine-tuned using LoRA on 4 A100 80GB. For training details and inference instructions please see the [Platypus-30B](https://github.com/arielnlee/Platypus-30B.git) GitHub repo.
|
49 |
|
50 |
## Reproducing Evaluation Results
|
51 |
-
Install LM Evaluation Harness
|
52 |
```
|
53 |
git clone https://github.com/EleutherAI/lm-evaluation-harness
|
54 |
cd lm-evaluation-harness
|
@@ -56,22 +56,22 @@ pip install -e .
|
|
56 |
```
|
57 |
Each task was evaluated on a single A100 80GB GPU.
|
58 |
|
59 |
-
ARC
|
60 |
```
|
61 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/Platypus-30B --tasks arc_challenge --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/arc_challenge_25shot.json --device cuda --num_fewshot 25
|
62 |
```
|
63 |
|
64 |
-
HellaSwag
|
65 |
```
|
66 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/Platypus-30B --tasks hellaswag --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/hellaswag_10shot.json --device cuda --num_fewshot 10
|
67 |
```
|
68 |
|
69 |
-
MMLU
|
70 |
```
|
71 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/Platypus-30B --tasks hendrycksTest-* --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/mmlu_5shot.json --device cuda --num_fewshot 5
|
72 |
```
|
73 |
|
74 |
-
TruthfulQA
|
75 |
```
|
76 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/Platypus-30B --tasks truthfulqa_mc --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/truthfulqa_0shot.json --device cuda
|
77 |
```
|
|
|
48 |
`lilloukas/Platypus-30B` was instruction fine-tuned using LoRA on 4 A100 80GB. For training details and inference instructions please see the [Platypus-30B](https://github.com/arielnlee/Platypus-30B.git) GitHub repo.
|
49 |
|
50 |
## Reproducing Evaluation Results
|
51 |
+
Install LM Evaluation Harness:
|
52 |
```
|
53 |
git clone https://github.com/EleutherAI/lm-evaluation-harness
|
54 |
cd lm-evaluation-harness
|
|
|
56 |
```
|
57 |
Each task was evaluated on a single A100 80GB GPU.
|
58 |
|
59 |
+
ARC:
|
60 |
```
|
61 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/Platypus-30B --tasks arc_challenge --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/arc_challenge_25shot.json --device cuda --num_fewshot 25
|
62 |
```
|
63 |
|
64 |
+
HellaSwag:
|
65 |
```
|
66 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/Platypus-30B --tasks hellaswag --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/hellaswag_10shot.json --device cuda --num_fewshot 10
|
67 |
```
|
68 |
|
69 |
+
MMLU:
|
70 |
```
|
71 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/Platypus-30B --tasks hendrycksTest-* --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/mmlu_5shot.json --device cuda --num_fewshot 5
|
72 |
```
|
73 |
|
74 |
+
TruthfulQA:
|
75 |
```
|
76 |
python main.py --model hf-causal-experimental --model_args pretrained=lilloukas/Platypus-30B --tasks truthfulqa_mc --batch_size 1 --no_cache --write_out --output_path results/Platypus-30B/truthfulqa_0shot.json --device cuda
|
77 |
```
|