C0uchP0tat0's picture
End of training
86be24b
metadata
base_model: ai-forever/rugpt3large_based_on_gpt2
tags:
  - generated_from_trainer
model-index:
  - name: laws_rugpt3medium_finetune
    results: []

laws_rugpt3medium_finetune

This model is a fine-tuned version of ai-forever/rugpt3large_based_on_gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4051

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 3
  • total_train_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 30
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
3.3772 0.23 25 3.3796
3.4598 0.46 50 3.3744
3.3981 0.69 75 3.3587
3.4916 0.93 100 3.3322
3.4166 1.16 125 3.2980
3.3829 1.39 150 3.2626
3.2992 1.62 175 3.2285
3.3237 1.85 200 3.1936
3.2106 2.08 225 3.1601
3.1947 2.31 250 3.1311
3.2183 2.55 275 3.0988
3.2124 2.78 300 3.0620
3.1725 3.01 325 3.0266
3.078 3.24 350 2.9931
3.0387 3.47 375 2.9595
3.0944 3.7 400 2.9194
3.049 3.94 425 2.8818
2.9818 4.17 450 2.8438
2.9278 4.4 475 2.8074
2.9172 4.63 500 2.7671
2.8432 4.86 525 2.7233
2.8499 5.09 550 2.6794
2.76 5.32 575 2.6310
2.7197 5.56 600 2.5857
2.793 5.79 625 2.5458
2.6895 6.02 650 2.4991
2.651 6.25 675 2.4496
2.5484 6.48 700 2.4014
2.5728 6.71 725 2.3471
2.4865 6.94 750 2.2953
2.4388 7.18 775 2.2369
2.4137 7.41 800 2.1799
2.3262 7.64 825 2.1285
2.3043 7.87 850 2.0836
2.2541 8.1 875 2.0299
2.1348 8.33 900 1.9730
2.1904 8.56 925 1.9211
2.0869 8.8 950 1.8719
2.1606 9.03 975 1.8210
1.9323 9.26 1000 1.7712
1.9892 9.49 1025 1.7254
1.9407 9.72 1050 1.6757
1.8791 9.95 1075 1.6214
1.7791 10.19 1100 1.5702
1.7523 10.42 1125 1.5284
1.7336 10.65 1150 1.4912
1.7709 10.88 1175 1.4475
1.6533 11.11 1200 1.3941
1.5671 11.34 1225 1.3536
1.5394 11.57 1250 1.3209
1.6085 11.81 1275 1.2921
1.5465 12.04 1300 1.2599
1.4172 12.27 1325 1.2292
1.4422 12.5 1350 1.1927
1.4708 12.73 1375 1.1563
1.3859 12.96 1400 1.1260
1.2036 13.19 1425 1.0932
1.3393 13.43 1450 1.0697
1.3203 13.66 1475 1.0376
1.2902 13.89 1500 1.0084
1.2356 14.12 1525 0.9760
1.2329 14.35 1550 0.9531
1.2039 14.58 1575 0.9343
1.1521 14.81 1600 0.9084
1.0754 15.05 1625 0.8786
1.0786 15.28 1650 0.8620
1.1052 15.51 1675 0.8395
1.0765 15.74 1700 0.8192
1.0817 15.97 1725 0.8002
1.0285 16.2 1750 0.7715
1.0313 16.44 1775 0.7612
0.9682 16.67 1800 0.7458
1.0025 16.9 1825 0.7267
0.9516 17.13 1850 0.7052
0.9475 17.36 1875 0.6952
0.8851 17.59 1900 0.6745
0.9463 17.82 1925 0.6602
0.8937 18.06 1950 0.6436
0.8135 18.29 1975 0.6316
0.8738 18.52 2000 0.6172
0.8585 18.75 2025 0.6072
0.8782 18.98 2050 0.5968
0.8324 19.21 2075 0.5789
0.7818 19.44 2100 0.5688
0.8375 19.68 2125 0.5602
0.7838 19.91 2150 0.5498
0.8015 20.14 2175 0.5369
0.724 20.37 2200 0.5299
0.7298 20.6 2225 0.5233
0.8079 20.83 2250 0.5141
0.77 21.06 2275 0.5058
0.7299 21.3 2300 0.4995
0.7152 21.53 2325 0.4893
0.6905 21.76 2350 0.4882
0.7492 21.99 2375 0.4779
0.6817 22.22 2400 0.4681
0.6893 22.45 2425 0.4652
0.7098 22.69 2450 0.4611
0.7063 22.92 2475 0.4582
0.6562 23.15 2500 0.4511
0.7083 23.38 2525 0.4474
0.6684 23.61 2550 0.4438
0.6688 23.84 2575 0.4398
0.6561 24.07 2600 0.4334
0.6664 24.31 2625 0.4318
0.6418 24.54 2650 0.4294
0.6723 24.77 2675 0.4249
0.6164 25.0 2700 0.4215
0.6348 25.23 2725 0.4203
0.6464 25.46 2750 0.4182
0.6392 25.69 2775 0.4171
0.6186 25.93 2800 0.4156
0.6447 26.16 2825 0.4138
0.6445 26.39 2850 0.4114
0.6037 26.62 2875 0.4109
0.6074 26.85 2900 0.4099
0.6509 27.08 2925 0.4092
0.6416 27.31 2950 0.4082
0.6391 27.55 2975 0.4075
0.594 27.78 3000 0.4071
0.6231 28.01 3025 0.4066
0.6151 28.24 3050 0.4061
0.6464 28.47 3075 0.4056
0.6024 28.7 3100 0.4054
0.6277 28.94 3125 0.4052
0.6017 29.17 3150 0.4052
0.6226 29.4 3175 0.4051
0.6084 29.63 3200 0.4051
0.639 29.86 3225 0.4051

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.16.0
  • Tokenizers 0.15.0