Edit model card

squad_qa_title_v5_full_qaonly_Qwen_Qwen1.5-4B_3e-5_lora

This model is a fine-tuned version of Qwen/Qwen1.5-4B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.3449
  • Accuracy: 0.5876

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 0.9916 74 1.6290 0.6255
1.924 1.9966 149 1.6617 0.6233
1.4922 2.9883 223 1.6980 0.6205
1.4922 3.9933 298 1.7298 0.6192
1.4068 4.9983 373 1.7481 0.6196
1.2983 5.9899 447 1.8004 0.6162
1.185 6.9950 522 1.8513 0.6133
1.185 8.0 597 1.9491 0.6078
1.0272 8.9916 671 2.0439 0.6047
0.837 9.9966 746 2.0819 0.6030
0.6959 10.9883 820 2.2470 0.5975
0.6959 11.9933 895 2.3402 0.5950
0.5675 12.9983 970 2.4646 0.5927
0.4565 13.9899 1044 2.5360 0.5919
0.4075 14.9950 1119 2.6063 0.5919
0.4075 16.0 1194 2.6696 0.5902
0.371 16.9916 1268 2.7577 0.5906
0.3443 17.9966 1343 2.8044 0.5889
0.3357 18.9883 1417 2.7873 0.5904
0.3357 19.9933 1492 2.7809 0.5918
0.3235 20.9983 1567 2.8768 0.5891
0.3152 21.9899 1641 2.8592 0.5899
0.3142 22.9950 1716 2.8668 0.5921
0.3142 24.0 1791 2.9501 0.5910
0.3092 24.9916 1865 2.9203 0.5912
0.3032 25.9966 1940 2.9752 0.5924
0.3044 26.9883 2014 2.9303 0.5905
0.3044 27.9933 2089 2.9913 0.5915
0.2998 28.9983 2164 2.9710 0.5901
0.2958 29.9899 2238 3.0960 0.5935
0.2977 30.9950 2313 2.9996 0.592
0.2977 32.0 2388 3.0486 0.5914
0.2935 32.9916 2462 3.0225 0.5911
0.2931 33.9966 2537 2.9860 0.5912
0.293 34.9883 2611 3.0856 0.5903
0.293 35.9933 2686 3.0234 0.5893
0.2909 36.9983 2761 3.0614 0.5922
0.2879 37.9899 2835 3.0555 0.5918
0.2906 38.9950 2910 3.1130 0.5921
0.2906 40.0 2985 3.1067 0.5913
0.2865 40.9916 3059 3.1949 0.5905
0.2857 41.9966 3134 3.1127 0.5913
0.2879 42.9883 3208 3.1623 0.5907
0.2879 43.9933 3283 3.1368 0.5901
0.2844 44.9983 3358 3.1650 0.5898
0.2838 45.9899 3432 3.2152 0.5893
0.2851 46.9950 3507 3.1605 0.5906
0.2851 48.0 3582 3.1204 0.5917
0.282 48.9916 3656 3.1551 0.5883
0.2812 49.5812 3700 3.3449 0.5876

Framework versions

  • PEFT 0.5.0
  • Transformers 4.40.2
  • Pytorch 2.3.0
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for tyzhu/squad_qa_title_v5_full_qaonly_Qwen_Qwen1.5-4B_3e-5_lora

Base model

Qwen/Qwen1.5-4B
Adapter
(268)
this model