--- license: llama2 --- This is the **Full-Weight** of WizardLM-13B V1.2 model, this model is trained from **Llama-2 13b**. ## WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions

🤗 HF Repo •🐱 Github Repo • 🐦 Twitter • 📃 [WizardLM] • 📃 [WizardCoder] • 📃 [WizardMath]

👋 Join our Discord

## News - 🔥🔥🔥[2023/08/26] We released **WizardCoder-Python-34B-V1.0** , which achieves the **73.2 pass@1** and surpasses **GPT4 (2023/03/15)**, **ChatGPT-3.5**, and **Claude2** on the [HumanEval Benchmarks](https://github.com/openai/human-eval). For more details, please refer to [WizardCoder](https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder). - [2023/06/16] We released **WizardCoder-15B-V1.0** , which surpasses **Claude-Plus (+6.8)**, **Bard (+15.3)** and **InstructCodeT5+ (+22.3)** on the [HumanEval Benchmarks](https://github.com/openai/human-eval). For more details, please refer to [WizardCoder](https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder). | Model | Checkpoint | Paper | HumanEval | MBPP | Demo | License | | ----- |------| ---- |------|-------| ----- | ----- | | WizardCoder-Python-34B-V1.0 | 🤗 HF Link | 📃 [WizardCoder] | 73.2 | 61.2 | [Demo](http://47.103.63.15:50085/) | Llama2 | | WizardCoder-15B-V1.0 | 🤗 HF Link | 📃 [WizardCoder] | 59.8 |50.6 | -- | OpenRAIL-M | | WizardCoder-Python-13B-V1.0 | 🤗 HF Link | 📃 [WizardCoder] | 64.0 | 55.6 | -- | Llama2 | | WizardCoder-Python-7B-V1.0 | 🤗 HF Link | 📃 [WizardCoder] | 55.5 | 51.6 | [Demo](http://47.103.63.15:50088/) | Llama2 | | WizardCoder-3B-V1.0 | 🤗 HF Link | 📃 [WizardCoder] | 34.8 |37.4 | -- | OpenRAIL-M | | WizardCoder-1B-V1.0 | 🤗 HF Link | 📃 [WizardCoder] | 23.8 |28.6 | -- | OpenRAIL-M | - 🔥 [08/11/2023] We release **WizardMath** Models. - 🔥 Our **WizardMath-70B-V1.0** model slightly outperforms some closed-source LLMs on the GSM8K, including **ChatGPT 3.5**, **Claude Instant 1** and **PaLM 2 540B**. - 🔥 Our **WizardMath-70B-V1.0** model achieves **81.6 pass@1** on the [GSM8k Benchmarks](https://github.com/openai/grade-school-math), which is **24.8** points higher than the SOTA open-source LLM. - 🔥 Our **WizardMath-70B-V1.0** model achieves **22.7 pass@1** on the [MATH Benchmarks](https://github.com/hendrycks/math), which is **9.2** points higher than the SOTA open-source LLM. | Model | Checkpoint | Paper | GSM8k | MATH |Online Demo| License| | ----- |------| ---- |------|-------| ----- | ----- | | WizardMath-70B-V1.0 | 🤗 HF Link | 📃 [WizardMath]| **81.6** | **22.7** |[Demo](http://47.103.63.15:50083/)| Llama 2 | | WizardMath-13B-V1.0 | 🤗 HF Link | 📃 [WizardMath]| **63.9** | **14.0** |[Demo](http://47.103.63.15:50082/)| Llama 2 | | WizardMath-7B-V1.0 | 🤗 HF Link | 📃 [WizardMath]| **54.9** | **10.7** | [Demo](http://47.103.63.15:50080/)| Llama 2 | | Model | Checkpoint | Paper |MT-Bench | AlpacaEval | WizardEval | HumanEval | License| | ----- |------| ---- |------|-------| ----- | ----- | ----- | | WizardLM-13B-V1.2 | 🤗 HF Link | | 7.06 | 89.17% | 101.4% |36.6 pass@1| Llama 2 License | | WizardLM-13B-V1.1 | 🤗 HF Link | | 6.76 |86.32% | 99.3% |25.0 pass@1| Non-commercial| | WizardLM-30B-V1.0 | 🤗 HF Link | | 7.01 | | 97.8% | 37.8 pass@1| Non-commercial | | WizardLM-13B-V1.0 | 🤗 HF Link | | 6.35 | 75.31% | 89.1% | 24.0 pass@1 | Non-commercial| | WizardLM-7B-V1.0 | 🤗 HF Link | 📃 [WizardLM] | | | 78.0% |19.1 pass@1 | Non-commercial| **Repository**: https://github.com/nlpxucan/WizardLM **Twitter**: - 🔥🔥🔥 [7/25/2023] We released **WizardLM V1.2** models. The **WizardLM-13B-V1.2** is here ([Demo_13B-V1.2](https://b7a19878988c8c73.gradio.app), [Demo_13B-V1.2_bak-1](https://d0a37a76e0ac4b52.gradio.app/), [Full Model Weight](https://huggingface.co/WizardLM/WizardLM-13B-V1.2)). Please checkout the [paper](https://arxiv.org/abs/2304.12244). - 🔥🔥🔥 [7/25/2023] The **WizardLM-13B-V1.2** achieves **7.06** on [MT-Bench Leaderboard](https://chat.lmsys.org/?leaderboard), **89.17%** on [AlpacaEval Leaderboard](https://tatsu-lab.github.io/alpaca_eval/), and **101.4%** on [WizardLM Eval](https://github.com/nlpxucan/WizardLM/blob/main/WizardLM/data/WizardLM_testset.jsonl). (Note: MT-Bench and AlpacaEval are all self-test, will push update and request review. All tests are completed under their official settings.) ❗Note for model system prompts usage: WizardLM adopts the prompt format from Vicuna and supports **multi-turn** conversation. The prompt should be as following: ``` A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.USER: Who are you? ASSISTANT: I am WizardLM....... ``` ## Inference WizardLM Demo Script We provide the inference WizardLM demo code [here](https://github.com/nlpxucan/WizardLM/tree/main/demo). Please cite the paper if you use the data or code from WizardLM. ``` @article{xu2023wizardlm, title={Wizardlm: Empowering large language models to follow complex instructions}, author={Xu, Can and Sun, Qingfeng and Zheng, Kai and Geng, Xiubo and Zhao, Pu and Feng, Jiazhan and Tao, Chongyang and Jiang, Daxin}, journal={arXiv preprint arXiv:2304.12244}, year={2023} } ``` ❗To commen concern about dataset: Recently, there have been clear changes in the open-source policy and regulations of our overall organization's code, data, and models. Despite this, we have still worked hard to obtain opening the weights of the model first, but the data involves stricter auditing and is in review with our legal team . Our researchers have no authority to publicly release them without authorization. Thank you for your understanding. # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_WizardLM__WizardLM-13B-V1.2) | Metric | Value | |-----------------------|---------------------------| | Avg. | 49.25 | | ARC (25-shot) | 59.04 | | HellaSwag (10-shot) | 82.21 | | MMLU (5-shot) | 54.64 | | TruthfulQA (0-shot) | 47.27 | | Winogrande (5-shot) | 71.9 | | GSM8K (5-shot) | 13.5 | | DROP (3-shot) | 16.17 |