|
--- |
|
language: code |
|
tags: |
|
- code |
|
- gpt2 |
|
- generation |
|
datasets: |
|
- giulio98/xlcost-single-prompt |
|
widget: |
|
- text: "'''\nfunction to add two numbers\n'''\n###\n" |
|
example_title: "add two numbers" |
|
model-index: |
|
- name: codegen-350M-multi-xlcost |
|
results: |
|
- task: |
|
name: Code Generation |
|
type: code-generation |
|
dataset: |
|
name: "XLCost" |
|
type: code_eval_outputs |
|
metrics: |
|
- name: pass@1 |
|
type: code_eval_outputs |
|
value: 3.325 |
|
- name: pass@10 |
|
type: code_eval_outputs |
|
value: 15 |
|
- name: codebleu |
|
type: codebleu |
|
value: 20.18191 |
|
--- |
|
|
|
# CodeGen-350M-multi-xlcost-v2 |
|
|
|
CodeGen-350M-multi-xlcost is a CodeGen model fine-tuned on the Python split of XLCost dataset using Deepspeed. |
|
|
|
## Usage |
|
|
|
You can load the CodeGen-350M-multi-xlcost-v2 model and tokenizer directly in `transformers`: |
|
|
|
```Python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
tokenizer = AutoTokenizer.from_pretrained("giulio98/codegen-350M-multi-xlcost-v2") |
|
model = AutoModelForCausalLM.from_pretrained("giulio98/codegen-350M-multi-xlcost-v2") |
|
|
|
text = tokenizer.eos_token + "\'\'\'\n" + "function to add two numbers" + "\n\'\'\'\n" + "###\n" |
|
input_ids = tokenizer(text, return_tensors="pt").input_ids |
|
|
|
generated_ids = model.generate(input_ids, max_length=128) |
|
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True)) |
|
``` |
|
Output: |
|
```Python |
|
''' |
|
function to add two numbers |
|
''' |
|
### |
|
def add(a, b): |
|
return a + b |
|
``` |
|
## Training |
|
|
|
The model was finetuned on [XLCost-single-prompt](https://huggingface.co/datasets/giulio98/xlcost-single-prompt), an improved version of the original XLCost dataset [ |
|
xlcost-text-to-code](https://huggingface.co/datasets/codeparrot/xlcost-text-to-code). Below the hyperparameters. |
|
|
|
| Hyperparameter | value | |
|
|---------------------------|--------| |
|
|Per device train batch size| 16 | |
|
|Context size| 1024 | |
|
|Training steps| 259| |
|
|Gradient accumulation| 2| |
|
|Gradient checkpointing| True| |
|
|Learning rate|1.8e-05 | |
|
|Weight decay | 0.1 | |
|
|Warmup steps| 35 | |
|
|Schedule| linear | |
|
|zero stage| 2 | |
|
|
|
Below the deepspeed configuration |
|
```Python |
|
{ |
|
"fp16": { |
|
"enabled": true, |
|
"loss_scale": 0, |
|
"loss_scale_window": 1000, |
|
"initial_scale_power": 16, |
|
"hysteresis": 2, |
|
"min_loss_scale": 1 |
|
}, |
|
"optimizer": { |
|
"type": "AdamW", |
|
"params": { |
|
"lr": 0.000018, |
|
"betas": [ |
|
0.9, |
|
0.999 |
|
], |
|
"eps": 1e-8, |
|
"weight_decay": 0.1 |
|
} |
|
}, |
|
"scheduler": { |
|
"type": "WarmupLR", |
|
"params": { |
|
"warmup_min_lr": 0, |
|
"warmup_max_lr": 0.000018, |
|
"warmup_num_steps": 35 |
|
} |
|
}, |
|
"zero_optimization": { |
|
"stage": 2, |
|
"offload_optimizer": { |
|
"device": "cpu", |
|
"pin_memory": false |
|
}, |
|
"allgather_partitions": true, |
|
"allgather_bucket_size": 200000000, |
|
"overlap_comm": true, |
|
"reduce_scatter": true, |
|
"reduce_bucket_size": 200000000, |
|
"contiguous_gradients": true |
|
}, |
|
"gradient_accumulation_steps": 2, |
|
"train_batch_size": 32, |
|
"train_micro_batch_size_per_gpu": 16, |
|
"gradient_clipping": 1, |
|
"wall_clock_breakdown": false |
|
} |
|
``` |
|
|
|
The training was executed on 1 x V100 (16GB) GPU for 28min 50sec |
|
|
|
## Performance |
|
|
|
We evaluated the model on the first 400 samples of XLCOST's [XLCost-single-prompt test split](https://huggingface.co/datasets/giulio98/xlcost-single-prompt/viewer/Python/test) and comparing the outputs of the generated codes with respect to the expected output using pass@k metric. |
|
|
|
| Metric | codegen-350M-multi-xlcost-v2 | codegen-350M-multi-xlcost | codegen-350M-mono(zero-shot) | codegen-350M-mono (one-shot) | codegen-350M-mono(few-shot) |
|
|--------|-----|-----|-----|-----|-----| |
|
|pass@1 |3.325% |3.70% | 0.4% | 0.35% | 0.48% | |
|
|pass@10 |15%| 14.5% | 3.5% | 3 % | 3.75% | |
|
|CodeBLEU |20.18%| None | 15.15% | 19.42 % | 20.27% | |
|
|
|
The [pass@k metric](https://huggingface.co/metrics/code_eval) tells the probability that at least one out of k generations passes the tests. |
|
|
|
## Citations |
|
``` |
|
@article{Nijkamp2022ACP, |
|
title={A Conversational Paradigm for Program Synthesis}, |
|
author={Nijkamp, Erik and Pang, Bo and Hayashi, Hiroaki and Tu, Lifu and Wang, Huan and Zhou, Yingbo and Savarese, Silvio and Xiong, Caiming}, |
|
journal={arXiv preprint}, |
|
year={2022} |
|
} |
|
``` |