Edit model card

sql-code-llama-alan

This model is a fine-tuned version of codellama/CodeLlama-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4576

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

The following bitsandbytes quantization config was used during training:

  • quant_method: bitsandbytes
  • _load_in_8bit: True
  • _load_in_4bit: False
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: fp4
  • bnb_4bit_use_double_quant: False
  • bnb_4bit_compute_dtype: float32
  • bnb_4bit_quant_storage: uint8
  • load_in_4bit: False
  • load_in_8bit: True

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 32
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • training_steps: 400
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
2.1992 0.0465 20 2.0335
1.14 0.0931 40 0.8371
0.8045 0.1396 60 0.6549
0.584 0.1862 80 0.5715
0.3807 0.2327 100 0.5561
0.5723 0.2792 120 0.5147
0.4262 0.3258 140 0.5056
0.6375 0.3723 160 0.5191
0.4839 0.4188 180 0.4865
0.3596 0.4654 200 0.4994
0.5285 0.5119 220 0.4803
0.4035 0.5585 240 0.4753
0.6019 0.6050 260 0.4772
0.4663 0.6515 280 0.4670
0.345 0.6981 300 0.4746
0.509 0.7446 320 0.4652
0.3946 0.7912 340 0.4614
0.5714 0.8377 360 0.4614
0.4525 0.8842 380 0.4585
0.3432 0.9308 400 0.4576

Framework versions

  • PEFT 0.6.0.dev0
  • Transformers 4.44.0.dev0
  • Pytorch 2.2.2+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Liu-Xiang/sql-code-llama-alan

Adapter
(240)
this model