|
--- |
|
language: code |
|
tags: |
|
- code |
|
- gpt2 |
|
- generation |
|
datasets: |
|
- giulio98/xlcost-single-prompt |
|
widget: |
|
- text: "'''\nfunction to add two numbers\n'''\n###\n" |
|
example_title: "add two numbers" |
|
model-index: |
|
- name: codegen-350M-multi-xlcost |
|
results: |
|
- task: |
|
name: Code Generation |
|
type: code-generation |
|
dataset: |
|
name: "XLCost" |
|
type: code_eval_outputs |
|
metrics: |
|
- name: pass@1 |
|
type: code_eval_outputs |
|
value: 3.70 |
|
- name: pass@10 |
|
type: code_eval_outputs |
|
value: 14.5 |
|
--- |
|
|
|
# CodeGen-350M-multi-xlcost |
|
|
|
CodeGen-350M-multi-xlcost is a CodeGen model fine-tuned on the Python split of XLCost dataset. |
|
|
|
## Usage |
|
|
|
You can load the CodeGen-350M-multi-xlcost model and tokenizer directly in `transformers`: |
|
|
|
```Python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
tokenizer = AutoTokenizer.from_pretrained("giulio98/codegen-350M-multi-xlcost") |
|
model = AutoModelForCausalLM.from_pretrained("giulio98/codegen-350M-multi-xlcost") |
|
|
|
text = tokenizer.eos_token + "\'\'\'\n" + "function to add two numbers" + "\n\'\'\'\n" + "###\n" |
|
input_ids = tokenizer(text, return_tensors="pt").input_ids |
|
|
|
generated_ids = model.generate(input_ids, max_length=128) |
|
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True)) |
|
``` |
|
Output: |
|
```Python |
|
''' |
|
function to add two numbers |
|
''' |
|
### |
|
def add(a, b): |
|
return a + b |
|
``` |
|
## Training |
|
|
|
The model was finetuned on [XLCost-single-prompt](https://huggingface.co/datasets/giulio98/xlcost-single-prompt), an improved version of the original XLCost dataset [ |
|
xlcost-text-to-code](https://huggingface.co/datasets/codeparrot/xlcost-text-to-code). Below the hyperparameters. |
|
|
|
| Hyperparameter | value | |
|
|---------------------------|--------| |
|
|Per device train batch size| 8 | |
|
|Context size| 1024 | |
|
|Training steps| 258| |
|
|Gradient accumulation| 4| |
|
|Gradient checkpointing| True| |
|
|Learning rate|1.8e-05 | |
|
|Weight decay | 0.0 | |
|
|Warmup steps| 10 | |
|
|Schedule| linear | |
|
|
|
The training was executed on 1 x V100 (16GB) GPU for 6h 42m |
|
|
|
## Performance |
|
|
|
We evaluated the model on the first 400 samples of XLCOST's [XLCost-single-prompt test split](https://huggingface.co/datasets/giulio98/xlcost-single-prompt/viewer/Python/test) and comparing the outputs of the generated codes with respect to the expected output using pass@k metric. |
|
|
|
| Metric | codegen-350M-multi-xlcost | codegen-350M-mono(zero-shot) | codegen-350M-mono (one-shot) | codegen-350M-mono(few-shot) |
|
|--------|-----|-----|-----|-----| |
|
|pass@1 | 3.70% | 0.4% | 0.35% | 0.48% | |
|
|pass@10 | 14.5% | 3.5% | 3 % | 3.75% | |
|
|
|
The [pass@k metric](https://huggingface.co/metrics/code_eval) tells the probability that at least one out of k generations passes the tests. |
|
|
|
## Citations |
|
``` |
|
@article{Nijkamp2022ACP, |
|
title={A Conversational Paradigm for Program Synthesis}, |
|
author={Nijkamp, Erik and Pang, Bo and Hayashi, Hiroaki and Tu, Lifu and Wang, Huan and Zhou, Yingbo and Savarese, Silvio and Xiong, Caiming}, |
|
journal={arXiv preprint}, |
|
year={2022} |
|
} |
|
``` |