giulio98's picture
language: code
- code
- gpt2
- generation
- giulio98/xlcost-single-prompt
- text: "'''\nfunction to add two numbers\n'''\n###\n"
example_title: "add two numbers"
- name: codegen-350M-multi-xlcost
- task:
name: Code Generation
type: code-generation
name: "XLCost"
type: code_eval_outputs
- name: pass@1
type: code_eval_outputs
value: 3.70
- name: pass@10
type: code_eval_outputs
value: 14.5
# CodeGen-350M-multi-xlcost
CodeGen-350M-multi-xlcost is a CodeGen model fine-tuned on the Python split of XLCost dataset.
## Usage
You can load the CodeGen-350M-multi-xlcost model and tokenizer directly in `transformers`:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("giulio98/codegen-350M-multi-xlcost")
model = AutoModelForCausalLM.from_pretrained("giulio98/codegen-350M-multi-xlcost")
text = tokenizer.eos_token + "\'\'\'\n" + "function to add two numbers" + "\n\'\'\'\n" + "###\n"
input_ids = tokenizer(text, return_tensors="pt").input_ids
generated_ids = model.generate(input_ids, max_length=128)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
function to add two numbers
def add(a, b):
return a + b
## Training
The model was finetuned on [XLCost-single-prompt](, an improved version of the original XLCost dataset [
xlcost-text-to-code]( Below the hyperparameters.
| Hyperparameter | value |
|Per device train batch size| 8 |
|Context size| 1024 |
|Training steps| 258|
|Gradient accumulation| 4|
|Gradient checkpointing| True|
|Learning rate|1.8e-05 |
|Weight decay | 0.0 |
|Warmup steps| 10 |
|Schedule| linear |
The training was executed on 1 x V100 (16GB) GPU for 6h 42m
## Performance
We evaluated the model on the first 400 samples of XLCOST's [XLCost-single-prompt test split]( and comparing the outputs of the generated codes with respect to the expected output using pass@k metric.
| Metric | codegen-350M-multi-xlcost | codegen-350M-mono(zero-shot) | codegen-350M-mono (one-shot) | codegen-350M-mono(few-shot)
|pass@1 | 3.70% | 0.4% | 0.35% | 0.48% |
|pass@10 | 14.5% | 3.5% | 3 % | 3.75% |
The [pass@k metric]( tells the probability that at least one out of k generations passes the tests.
## Citations
title={A Conversational Paradigm for Program Synthesis},
author={Nijkamp, Erik and Pang, Bo and Hayashi, Hiroaki and Tu, Lifu and Wang, Huan and Zhou, Yingbo and Savarese, Silvio and Xiong, Caiming},
journal={arXiv preprint},