Update README.md
Browse files
README.md
CHANGED
@@ -98,24 +98,37 @@ Thank you to all my generous patrons and donaters!
|
|
98 |
|
99 |
# Original model card: WizardLM's WizardCoder 15B 1.0
|
100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
101 |
# WizardCoder: Empowering Code Large Language Models with Evol-Instruct
|
102 |
|
103 |
-
[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)
|
104 |
-
[![Data License](https://img.shields.io/badge/Data%20License-CC%20By%20NC%204.0-red.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/DATA_LICENSE)
|
105 |
-
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/release/python-390/)
|
106 |
|
107 |
To develop our WizardCoder model, we begin by adapting the Evol-Instruct method specifically for coding tasks. This involves tailoring the prompt to the domain of code-related instructions. Subsequently, we fine-tune the Code LLM, StarCoder, utilizing the newly created instruction-following training set.
|
108 |
|
109 |
## News
|
110 |
|
111 |
- π₯ Our **WizardCoder-15B-v1.0** model achieves the **57.3 pass@1** on the [HumanEval Benchmarks](https://github.com/openai/human-eval), which is **22.3** points higher than the SOTA open-source Code LLMs.
|
112 |
-
- π₯ We released **WizardCoder-15B-v1.0** trained with **78k** evolved code instructions. Please checkout the [Model Weights](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0),
|
113 |
- 📣 Please refer to our Twitter account https://twitter.com/WizardLM_AI and HuggingFace Repo https://huggingface.co/WizardLM . We will use them to announce any new release at the 1st time.
|
114 |
|
115 |
|
116 |
## Comparing WizardCoder with the Closed-Source Models.
|
117 |
|
118 |
-
The SOTA LLMs for code generation, such as GPT4, Claude, and Bard, are predominantly closed-source. Acquiring access to the APIs of these models proves challenging. In this study, we adopt an alternative approach by retrieving the scores for HumanEval and HumanEval+ from the [LLM-Humaneval-Benchmarks](https://github.com/my-other-github-account/llm-humaneval-benchmarks). Notably, all the mentioned models generate code solutions for each problem utilizing a single attempt, and the resulting pass rate percentage is reported. Our **WizardCoder** generates answers using greedy decoding.
|
119 |
|
120 |
π₯ The following figure shows that our **WizardCoder attains the third position in this benchmark**, surpassing Claude-Plus (59.8 vs. 53.0) and Bard (59.8 vs. 44.5). Notably, our model exhibits a substantially smaller size compared to these models.
|
121 |
|
@@ -123,9 +136,11 @@ The SOTA LLMs for code generation, such as GPT4, Claude, and Bard, are predomina
|
|
123 |
<a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/WizardCoder/imgs/pass1.png" alt="WizardCoder" style="width: 86%; min-width: 300px; display: block; margin: auto;"></a>
|
124 |
</p>
|
125 |
|
|
|
|
|
126 |
## Comparing WizardCoder with the Open-Source Models.
|
127 |
|
128 |
-
The following table
|
129 |
|
130 |
|
131 |
| Model | HumanEval Pass@1 | MBPP Pass@1 |
|
@@ -144,7 +159,10 @@ The following table conducts a comprehensive comparison of our **WizardCoder** w
|
|
144 |
| WizardLM-30B 1.0| 37.8 |-- |
|
145 |
| WizardCoder-15B 1.0 | **57.3** |**51.8** |
|
146 |
|
147 |
-
|
|
|
|
|
|
|
148 |
|
149 |
## Call for Feedbacks
|
150 |
We welcome everyone to use your professional and difficult instructions to evaluate WizardCoder, and show us examples of poor performance and your suggestions in the [issue discussion](https://github.com/nlpxucan/WizardLM/issues) area. We are focusing on improving the Evol-Instruct now and hope to relieve existing weaknesses and issues in the the next version of WizardCoder. After that, we will open the code and pipeline of up-to-date Evol-Instruct algorithm and work with you together to improve it.
|
@@ -168,7 +186,7 @@ We welcome everyone to use your professional and difficult instructions to evalu
|
|
168 |
|
169 |
We will provide our latest models for you to try for as long as possible. If you find a link is not working, please try another one. At the same time, please try as many **real-world** and **challenging** code-related problems that you encounter in your work and life as possible. We will continue to evolve our models with your feedbacks.
|
170 |
|
171 |
-
|
172 |
|
173 |
## Fine-tuning
|
174 |
|
|
|
98 |
|
99 |
# Original model card: WizardLM's WizardCoder 15B 1.0
|
100 |
|
101 |
+
This is the Full-Weight of WizardCoder.
|
102 |
+
|
103 |
+
**Repository**: https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder
|
104 |
+
|
105 |
+
**Twitter**: https://twitter.com/WizardLM_AI/status/1669109414559911937
|
106 |
+
|
107 |
+
**Paper**: Is coming, with brand-new Evol+ methods for code LLMs.
|
108 |
+
|
109 |
+
**Demos (Only support code-related English instructions now.)**:
|
110 |
+
|
111 |
+
[Demo](https://8194635813f45a1e.gradio.app/),
|
112 |
+
[Backup Demo1](https://375cead61e4db124.gradio.app/),
|
113 |
+
[Backup Demo2](https://1594ad375fc80cc7.gradio.app/),
|
114 |
+
[Backup Demo3](https://4989441110ee350f.gradio.app/)
|
115 |
+
|
116 |
+
|
117 |
+
|
118 |
# WizardCoder: Empowering Code Large Language Models with Evol-Instruct
|
119 |
|
|
|
|
|
|
|
120 |
|
121 |
To develop our WizardCoder model, we begin by adapting the Evol-Instruct method specifically for coding tasks. This involves tailoring the prompt to the domain of code-related instructions. Subsequently, we fine-tune the Code LLM, StarCoder, utilizing the newly created instruction-following training set.
|
122 |
|
123 |
## News
|
124 |
|
125 |
- π₯ Our **WizardCoder-15B-v1.0** model achieves the **57.3 pass@1** on the [HumanEval Benchmarks](https://github.com/openai/human-eval), which is **22.3** points higher than the SOTA open-source Code LLMs.
|
126 |
+
- π₯ We released **WizardCoder-15B-v1.0** trained with **78k** evolved code instructions. Please checkout the [Model Weights](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0), and [Paper]().
|
127 |
- 📣 Please refer to our Twitter account https://twitter.com/WizardLM_AI and HuggingFace Repo https://huggingface.co/WizardLM . We will use them to announce any new release at the 1st time.
|
128 |
|
129 |
|
130 |
## Comparing WizardCoder with the Closed-Source Models.
|
131 |
|
|
|
132 |
|
133 |
π₯ The following figure shows that our **WizardCoder attains the third position in this benchmark**, surpassing Claude-Plus (59.8 vs. 53.0) and Bard (59.8 vs. 44.5). Notably, our model exhibits a substantially smaller size compared to these models.
|
134 |
|
|
|
136 |
<a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/WizardCoder/imgs/pass1.png" alt="WizardCoder" style="width: 86%; min-width: 300px; display: block; margin: auto;"></a>
|
137 |
</p>
|
138 |
|
139 |
+
β**Note: In this study, we copy the scores for HumanEval and HumanEval+ from the [LLM-Humaneval-Benchmarks](https://github.com/my-other-github-account/llm-humaneval-benchmarks). Notably, all the mentioned models generate code solutions for each problem utilizing a **single attempt**, and the resulting pass rate percentage is reported. Our **WizardCoder** generates answers using greedy decoding and tests with the same [code](https://github.com/evalplus/evalplus).**
|
140 |
+
|
141 |
## Comparing WizardCoder with the Open-Source Models.
|
142 |
|
143 |
+
The following table clearly demonstrates that our **WizardCoder** exhibits a substantial performance advantage over all the open-source models. β**If you are confused with the different scores of our model (57.3 and 59.8), please check the Notes.**
|
144 |
|
145 |
|
146 |
| Model | HumanEval Pass@1 | MBPP Pass@1 |
|
|
|
159 |
| WizardLM-30B 1.0| 37.8 |-- |
|
160 |
| WizardCoder-15B 1.0 | **57.3** |**51.8** |
|
161 |
|
162 |
+
|
163 |
+
β**Note: The reproduced result of StarCoder on MBPP.**
|
164 |
+
|
165 |
+
β**Note: The above table conducts a comprehensive comparison of our **WizardCoder** with other models on the HumanEval and MBPP benchmarks. We adhere to the approach outlined in previous studies by generating **20 samples** for each problem to estimate the pass@1 score and evaluate with the same [code](https://github.com/openai/human-eval/tree/master). The scores of GPT4 and GPT3.5 reported by [OpenAI](https://openai.com/research/gpt-4) are 67.0 and 48.1 (maybe these are the early version GPT4&3.5).**
|
166 |
|
167 |
## Call for Feedbacks
|
168 |
We welcome everyone to use your professional and difficult instructions to evaluate WizardCoder, and show us examples of poor performance and your suggestions in the [issue discussion](https://github.com/nlpxucan/WizardLM/issues) area. We are focusing on improving the Evol-Instruct now and hope to relieve existing weaknesses and issues in the the next version of WizardCoder. After that, we will open the code and pipeline of up-to-date Evol-Instruct algorithm and work with you together to improve it.
|
|
|
186 |
|
187 |
We will provide our latest models for you to try for as long as possible. If you find a link is not working, please try another one. At the same time, please try as many **real-world** and **challenging** code-related problems that you encounter in your work and life as possible. We will continue to evolve our models with your feedbacks.
|
188 |
|
189 |
+
|
190 |
|
191 |
## Fine-tuning
|
192 |
|