|
--- |
|
license: llama2 |
|
library_name: peft |
|
tags: |
|
- typescript |
|
- instruction-tuning |
|
- code-generation |
|
- lora |
|
- peft |
|
base_model: codellama/CodeLlama-13b-hf |
|
model-index: |
|
- name: lora-out |
|
results: [] |
|
datasets: |
|
- mhhmm/typescript-instruct-20k |
|
language: |
|
- en |
|
metrics: |
|
- code_eval |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
## Architecture |
|
|
|
![The Architecture](https://github.com/LeVuMinhHuy/brocode/blob/master/.pics/about-the-model.png?raw=true) |
|
|
|
## About |
|
|
|
This model is a fine-tuned version of [codellama/CodeLlama-13b-hf](https://huggingface.co/codellama/CodeLlama-13b-hf). |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.4268 |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0002 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- distributed_type: multi-GPU |
|
- num_devices: 2 |
|
- total_train_batch_size: 16 |
|
- total_eval_batch_size: 16 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_steps: 10 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:-----:|:----:|:---------------:| |
|
| 0.7555 | 0.01 | 1 | 0.7062 | |
|
| 0.7036 | 0.05 | 7 | 0.6673 | |
|
| 0.5422 | 0.1 | 14 | 0.5152 | |
|
| 0.5351 | 0.15 | 21 | 0.4866 | |
|
| 0.495 | 0.2 | 28 | 0.4688 | |
|
| 0.5651 | 0.25 | 35 | 0.4587 | |
|
| 0.5146 | 0.3 | 42 | 0.4486 | |
|
| 0.4955 | 0.35 | 49 | 0.4469 | |
|
| 0.5117 | 0.4 | 56 | 0.4432 | |
|
| 0.5245 | 0.45 | 63 | 0.4410 | |
|
| 0.5003 | 0.5 | 70 | 0.4371 | |
|
| 0.4502 | 0.55 | 77 | 0.4340 | |
|
| 0.527 | 0.6 | 84 | 0.4315 | |
|
| 0.48 | 0.65 | 91 | 0.4305 | |
|
| 0.448 | 0.7 | 98 | 0.4289 | |
|
| 0.5427 | 0.75 | 105 | 0.4289 | |
|
| 0.4715 | 0.8 | 112 | 0.4279 | |
|
| 0.5584 | 0.85 | 119 | 0.4276 | |
|
| 0.4936 | 0.9 | 126 | 0.4267 | |
|
| 0.4788 | 0.95 | 133 | 0.4268 | |
|
| 0.476 | 1.0 | 140 | 0.4268 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.36.0.dev0 |
|
- Pytorch 2.0.1+cu118 |
|
- Datasets 2.15.0 |
|
- Tokenizers 0.15.0 |
|
- PEFT 0.6.0 |
|
|
|
### Evaluation |
|
|
|
I'm using MultiPL-E benchmark, the same as Code Llmama using in their paper |
|
|
|
|
|
| Modal | Pass@k | Estimate | Num problems | |
|
|-----------------------------------------|--------|----------|---------------| |
|
| Code LLama - Instruct 13B | 1 | 39.0% | 159 | |
|
| Our 13B | 1 | 42.4% | 159 | |
|
|
|
How to reproduce my evaluation? Just run like the offical document of MultiPL-E: https://nuprl.github.io/MultiPL-E/tutorial.html, change the modal name by my model here: `mhhmm/typescript-instruct-20k-v2` |
|
|
|
This is the code that I ran with Google Colab (using A100 40GB, yes, it requires that much GPU RAM) |
|
|
|
If you even have a stronger GPU, increase the --batch-size, or --completion-limit |
|
|
|
``` |
|
!pip install --upgrade pip |
|
!pip install aiohttp numpy tqdm pytest datasets torch transformers sentencepiece |
|
!git clone https://github.com/nuprl/MultiPL-E |
|
%cd MultiPL-E |
|
!mkdir typescript |
|
!python3 automodel.py --name mhhmm/typescript-instruct-20k-v2 --root-dataset humaneval --lang ts --temperature 0.2 --batch-size 10 --completion-limit 20 --output-dir-prefix typescript |
|
%cd evaluation/src |
|
!python3 main.py --dir ../../typescript --output-dir ../../typescript --recursive |
|
!python3 pass_k.py ./typescript/* |
|
``` |
|
|
|
|