--- model-index: - name: WizardLM-13B-1.0 results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 28.5 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=victor123/WizardLM-13B-1.0 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 25.97 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=victor123/WizardLM-13B-1.0 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 23.12 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=victor123/WizardLM-13B-1.0 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 48.61 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=victor123/WizardLM-13B-1.0 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 49.41 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=victor123/WizardLM-13B-1.0 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 0.0 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=victor123/WizardLM-13B-1.0 name: Open LLM Leaderboard --- This is WizardLM-13B V1.0 diff weight. Project Repo: https://github.com/nlpxucan/WizardLM NOTE: The **WizardLM-13B-1.0** and **Wizard-7B** use different prompt at the beginning of the conversation: For **WizardLM-13B-1.0** , the Prompt should be as following: ``` A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: hello, who are you? ASSISTANT: ``` For **WizardLM-7B** , the Prompt should be as following: ``` {instruction}\n\n### Response: ```

🤗 HF Repo • 🐦 Twitter • 📃 [WizardLM] • 📃 [WizardCoder] • 📃 [WizardMath]

👋 Join our Discord

| Model | Checkpoint | Paper | HumanEval | MBPP | Demo | License | | ----- |------| ---- |------|-------| ----- | ----- | | WizardCoder-Python-34B-V1.0 | 🤗 HF Link | 📃 [WizardCoder] | 73.2 | 61.2 | [Demo](http://47.103.63.15:50085/) | Llama2 | | WizardCoder-15B-V1.0 | 🤗 HF Link | 📃 [WizardCoder] | 59.8 |50.6 | -- | OpenRAIL-M | | WizardCoder-Python-13B-V1.0 | 🤗 HF Link | 📃 [WizardCoder] | 64.0 | 55.6 | -- | Llama2 | | WizardCoder-3B-V1.0 | 🤗 HF Link | 📃 [WizardCoder] | 34.8 |37.4 | -- | OpenRAIL-M | | WizardCoder-1B-V1.0 | 🤗 HF Link | 📃 [WizardCoder] | 23.8 |28.6 | -- | OpenRAIL-M | | Model | Checkpoint | Paper | GSM8k | MATH |Online Demo| License| | ----- |------| ---- |------|-------| ----- | ----- | | WizardMath-70B-V1.0 | 🤗 HF Link | 📃 [WizardMath]| **81.6** | **22.7** |[Demo](http://47.103.63.15:50083/)| Llama 2 | | WizardMath-13B-V1.0 | 🤗 HF Link | 📃 [WizardMath]| **63.9** | **14.0** |[Demo](http://47.103.63.15:50082/)| Llama 2 | | WizardMath-7B-V1.0 | 🤗 HF Link | 📃 [WizardMath]| **54.9** | **10.7** | [Demo](http://47.103.63.15:50080/)| Llama 2 | | ^Model | ^Checkpoint | ^Paper |^MT-Bench | ^AlpacaEval | ^GSM8k | ^HumanEval | ^License| | ----- |------| ---- |------|-------| ----- | ----- | ----- | | ^{**WizardLM-70B-V1.0**} | ^{🤗 HF Link}|^{📃**Coming Soon**}| ^**7.78** | ^**92.91%** |^**77.6%** | ^{**50.6 pass@1**}|^{Llama 2 License} | | ^{WizardLM-13B-V1.2} | ^{🤗 HF Link}| | ^7.06 | ^89.17% |^55.3% | ^{36.6 pass@1}|^{Llama 2 License} | | ^{WizardLM-13B-V1.1} |^{🤗 HF Link} | | ^6.76 |^86.32% | | ^{25.0 pass@1}| ^{Non-commercial}| | ^{WizardLM-30B-V1.0} | ^{🤗 HF Link} | | ^7.01 | | | ^{37.8 pass@1}| ^{Non-commercial} | | ^{WizardLM-13B-V1.0} | ^{🤗 HF Link} | | ^6.35 | ^75.31% | | ^{24.0 pass@1} | ^{Non-commercial}| | ^{WizardLM-7B-V1.0}| ^{🤗 HF Link} |^{📃 [WizardLM]}| | | |^{19.1 pass@1}|^{Non-commercial}| **Github Repo**: https://github.com/nlpxucan/WizardLM/tree/main/WizardMath **Twitter**: https://twitter.com/WizardLM_AI/status/1689998428200112128 **Discord**: https://discord.gg/VZjjHtWrKs ## Inference WizardLM Demo Script We provide the inference WizardLM demo code [here](https://github.com/nlpxucan/WizardLM/tree/main/demo). # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_victor123__WizardLM-13B-1.0) | Metric |Value| |---------------------------------|----:| |Avg. |29.27| |AI2 Reasoning Challenge (25-Shot)|28.50| |HellaSwag (10-Shot) |25.97| |MMLU (5-Shot) |23.12| |TruthfulQA (0-shot) |48.61| |Winogrande (5-shot) |49.41| |GSM8k (5-shot) | 0.00|