File size: 4,295 Bytes
0153eac 22dc2a5 0153eac 9e4065f 0153eac 22dc2a5 0153eac |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
---
language: code
tags:
- code
- gpt2
- generation
datasets:
- giulio98/xlcost-single-prompt
widget:
- text: "'''\nfunction to add two numbers\n'''\n###\n"
example_title: "add two numbers"
model-index:
- name: codegen-350M-multi-xlcost
results:
- task:
name: Code Generation
type: code-generation
dataset:
name: "XLCost"
type: code_eval_outputs
metrics:
- name: pass@1
type: code_eval_outputs
value: 3.325
- name: pass@10
type: code_eval_outputs
value: 15
- name: codebleu
type: codebleu
value: 20.18191
---
# CodeGen-350M-multi-xlcost-v2
CodeGen-350M-multi-xlcost is a CodeGen model fine-tuned on the Python split of XLCost dataset using Deepspeed.
## Usage
You can load the CodeGen-350M-multi-xlcost-v2 model and tokenizer directly in `transformers`:
```Python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("giulio98/codegen-350M-multi-xlcost-v2")
model = AutoModelForCausalLM.from_pretrained("giulio98/codegen-350M-multi-xlcost-v2")
text = tokenizer.eos_token + "\'\'\'\n" + "function to add two numbers" + "\n\'\'\'\n" + "###\n"
input_ids = tokenizer(text, return_tensors="pt").input_ids
generated_ids = model.generate(input_ids, max_length=128)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
```
Output:
```Python
'''
function to add two numbers
'''
###
def add(a, b):
return a + b
```
## Training
The model was finetuned on [XLCost-single-prompt](https://huggingface.co/datasets/giulio98/xlcost-single-prompt), an improved version of the original XLCost dataset [
xlcost-text-to-code](https://huggingface.co/datasets/codeparrot/xlcost-text-to-code). Below the hyperparameters.
| Hyperparameter | value |
|---------------------------|--------|
|Per device train batch size| 16 |
|Context size| 1024 |
|Training steps| 259|
|Gradient accumulation| 2|
|Gradient checkpointing| True|
|Learning rate|1.8e-05 |
|Weight decay | 0.1 |
|Warmup steps| 35 |
|Schedule| linear |
|zero stage| 2 |
Below the deepspeed configuration
```Python
{
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": 0.000018,
"betas": [
0.9,
0.999
],
"eps": 1e-8,
"weight_decay": 0.1
}
},
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": 0,
"warmup_max_lr": 0.000018,
"warmup_num_steps": 35
}
},
"zero_optimization": {
"stage": 2,
"offload_optimizer": {
"device": "cpu",
"pin_memory": false
},
"allgather_partitions": true,
"allgather_bucket_size": 200000000,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 200000000,
"contiguous_gradients": true
},
"gradient_accumulation_steps": 2,
"train_batch_size": 32,
"train_micro_batch_size_per_gpu": 16,
"gradient_clipping": 1,
"wall_clock_breakdown": false
}
```
The training was executed on 1 x V100 (16GB) GPU for 28min 50sec
## Performance
We evaluated the model on the first 400 samples of XLCOST's [XLCost-single-prompt test split](https://huggingface.co/datasets/giulio98/xlcost-single-prompt/viewer/Python/test) and comparing the outputs of the generated codes with respect to the expected output using pass@k metric.
| Metric | codegen-350M-multi-xlcost-v2 | codegen-350M-multi-xlcost | codegen-350M-mono(zero-shot) | codegen-350M-mono (one-shot) | codegen-350M-mono(few-shot)
|--------|-----|-----|-----|-----|-----|
|pass@1 |3.325% |3.70% | 0.4% | 0.35% | 0.48% |
|pass@10 |15%| 14.5% | 3.5% | 3 % | 3.75% |
|CodeBLEU |20.18%| None | 15.15% | 19.42 % | 20.27% |
The [pass@k metric](https://huggingface.co/metrics/code_eval) tells the probability that at least one out of k generations passes the tests.
## Citations
```
@article{Nijkamp2022ACP,
title={A Conversational Paradigm for Program Synthesis},
author={Nijkamp, Erik and Pang, Bo and Hayashi, Hiroaki and Tu, Lifu and Wang, Huan and Zhou, Yingbo and Savarese, Silvio and Xiong, Caiming},
journal={arXiv preprint},
year={2022}
}
``` |