--- base_model: ai-forever/rugpt3large_based_on_gpt2 tags: - generated_from_trainer model-index: - name: laws_rugpt3medium_finetune results: [] --- # laws_rugpt3medium_finetune This model is a fine-tuned version of [ai-forever/rugpt3large_based_on_gpt2](https://huggingface.co/ai-forever/rugpt3large_based_on_gpt2) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 0.4051 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 3 - total_train_batch_size: 12 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 1000 - num_epochs: 30 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 3.3772 | 0.23 | 25 | 3.3796 | | 3.4598 | 0.46 | 50 | 3.3744 | | 3.3981 | 0.69 | 75 | 3.3587 | | 3.4916 | 0.93 | 100 | 3.3322 | | 3.4166 | 1.16 | 125 | 3.2980 | | 3.3829 | 1.39 | 150 | 3.2626 | | 3.2992 | 1.62 | 175 | 3.2285 | | 3.3237 | 1.85 | 200 | 3.1936 | | 3.2106 | 2.08 | 225 | 3.1601 | | 3.1947 | 2.31 | 250 | 3.1311 | | 3.2183 | 2.55 | 275 | 3.0988 | | 3.2124 | 2.78 | 300 | 3.0620 | | 3.1725 | 3.01 | 325 | 3.0266 | | 3.078 | 3.24 | 350 | 2.9931 | | 3.0387 | 3.47 | 375 | 2.9595 | | 3.0944 | 3.7 | 400 | 2.9194 | | 3.049 | 3.94 | 425 | 2.8818 | | 2.9818 | 4.17 | 450 | 2.8438 | | 2.9278 | 4.4 | 475 | 2.8074 | | 2.9172 | 4.63 | 500 | 2.7671 | | 2.8432 | 4.86 | 525 | 2.7233 | | 2.8499 | 5.09 | 550 | 2.6794 | | 2.76 | 5.32 | 575 | 2.6310 | | 2.7197 | 5.56 | 600 | 2.5857 | | 2.793 | 5.79 | 625 | 2.5458 | | 2.6895 | 6.02 | 650 | 2.4991 | | 2.651 | 6.25 | 675 | 2.4496 | | 2.5484 | 6.48 | 700 | 2.4014 | | 2.5728 | 6.71 | 725 | 2.3471 | | 2.4865 | 6.94 | 750 | 2.2953 | | 2.4388 | 7.18 | 775 | 2.2369 | | 2.4137 | 7.41 | 800 | 2.1799 | | 2.3262 | 7.64 | 825 | 2.1285 | | 2.3043 | 7.87 | 850 | 2.0836 | | 2.2541 | 8.1 | 875 | 2.0299 | | 2.1348 | 8.33 | 900 | 1.9730 | | 2.1904 | 8.56 | 925 | 1.9211 | | 2.0869 | 8.8 | 950 | 1.8719 | | 2.1606 | 9.03 | 975 | 1.8210 | | 1.9323 | 9.26 | 1000 | 1.7712 | | 1.9892 | 9.49 | 1025 | 1.7254 | | 1.9407 | 9.72 | 1050 | 1.6757 | | 1.8791 | 9.95 | 1075 | 1.6214 | | 1.7791 | 10.19 | 1100 | 1.5702 | | 1.7523 | 10.42 | 1125 | 1.5284 | | 1.7336 | 10.65 | 1150 | 1.4912 | | 1.7709 | 10.88 | 1175 | 1.4475 | | 1.6533 | 11.11 | 1200 | 1.3941 | | 1.5671 | 11.34 | 1225 | 1.3536 | | 1.5394 | 11.57 | 1250 | 1.3209 | | 1.6085 | 11.81 | 1275 | 1.2921 | | 1.5465 | 12.04 | 1300 | 1.2599 | | 1.4172 | 12.27 | 1325 | 1.2292 | | 1.4422 | 12.5 | 1350 | 1.1927 | | 1.4708 | 12.73 | 1375 | 1.1563 | | 1.3859 | 12.96 | 1400 | 1.1260 | | 1.2036 | 13.19 | 1425 | 1.0932 | | 1.3393 | 13.43 | 1450 | 1.0697 | | 1.3203 | 13.66 | 1475 | 1.0376 | | 1.2902 | 13.89 | 1500 | 1.0084 | | 1.2356 | 14.12 | 1525 | 0.9760 | | 1.2329 | 14.35 | 1550 | 0.9531 | | 1.2039 | 14.58 | 1575 | 0.9343 | | 1.1521 | 14.81 | 1600 | 0.9084 | | 1.0754 | 15.05 | 1625 | 0.8786 | | 1.0786 | 15.28 | 1650 | 0.8620 | | 1.1052 | 15.51 | 1675 | 0.8395 | | 1.0765 | 15.74 | 1700 | 0.8192 | | 1.0817 | 15.97 | 1725 | 0.8002 | | 1.0285 | 16.2 | 1750 | 0.7715 | | 1.0313 | 16.44 | 1775 | 0.7612 | | 0.9682 | 16.67 | 1800 | 0.7458 | | 1.0025 | 16.9 | 1825 | 0.7267 | | 0.9516 | 17.13 | 1850 | 0.7052 | | 0.9475 | 17.36 | 1875 | 0.6952 | | 0.8851 | 17.59 | 1900 | 0.6745 | | 0.9463 | 17.82 | 1925 | 0.6602 | | 0.8937 | 18.06 | 1950 | 0.6436 | | 0.8135 | 18.29 | 1975 | 0.6316 | | 0.8738 | 18.52 | 2000 | 0.6172 | | 0.8585 | 18.75 | 2025 | 0.6072 | | 0.8782 | 18.98 | 2050 | 0.5968 | | 0.8324 | 19.21 | 2075 | 0.5789 | | 0.7818 | 19.44 | 2100 | 0.5688 | | 0.8375 | 19.68 | 2125 | 0.5602 | | 0.7838 | 19.91 | 2150 | 0.5498 | | 0.8015 | 20.14 | 2175 | 0.5369 | | 0.724 | 20.37 | 2200 | 0.5299 | | 0.7298 | 20.6 | 2225 | 0.5233 | | 0.8079 | 20.83 | 2250 | 0.5141 | | 0.77 | 21.06 | 2275 | 0.5058 | | 0.7299 | 21.3 | 2300 | 0.4995 | | 0.7152 | 21.53 | 2325 | 0.4893 | | 0.6905 | 21.76 | 2350 | 0.4882 | | 0.7492 | 21.99 | 2375 | 0.4779 | | 0.6817 | 22.22 | 2400 | 0.4681 | | 0.6893 | 22.45 | 2425 | 0.4652 | | 0.7098 | 22.69 | 2450 | 0.4611 | | 0.7063 | 22.92 | 2475 | 0.4582 | | 0.6562 | 23.15 | 2500 | 0.4511 | | 0.7083 | 23.38 | 2525 | 0.4474 | | 0.6684 | 23.61 | 2550 | 0.4438 | | 0.6688 | 23.84 | 2575 | 0.4398 | | 0.6561 | 24.07 | 2600 | 0.4334 | | 0.6664 | 24.31 | 2625 | 0.4318 | | 0.6418 | 24.54 | 2650 | 0.4294 | | 0.6723 | 24.77 | 2675 | 0.4249 | | 0.6164 | 25.0 | 2700 | 0.4215 | | 0.6348 | 25.23 | 2725 | 0.4203 | | 0.6464 | 25.46 | 2750 | 0.4182 | | 0.6392 | 25.69 | 2775 | 0.4171 | | 0.6186 | 25.93 | 2800 | 0.4156 | | 0.6447 | 26.16 | 2825 | 0.4138 | | 0.6445 | 26.39 | 2850 | 0.4114 | | 0.6037 | 26.62 | 2875 | 0.4109 | | 0.6074 | 26.85 | 2900 | 0.4099 | | 0.6509 | 27.08 | 2925 | 0.4092 | | 0.6416 | 27.31 | 2950 | 0.4082 | | 0.6391 | 27.55 | 2975 | 0.4075 | | 0.594 | 27.78 | 3000 | 0.4071 | | 0.6231 | 28.01 | 3025 | 0.4066 | | 0.6151 | 28.24 | 3050 | 0.4061 | | 0.6464 | 28.47 | 3075 | 0.4056 | | 0.6024 | 28.7 | 3100 | 0.4054 | | 0.6277 | 28.94 | 3125 | 0.4052 | | 0.6017 | 29.17 | 3150 | 0.4052 | | 0.6226 | 29.4 | 3175 | 0.4051 | | 0.6084 | 29.63 | 3200 | 0.4051 | | 0.639 | 29.86 | 3225 | 0.4051 | ### Framework versions - Transformers 4.35.2 - Pytorch 2.1.0+cu121 - Datasets 2.16.0 - Tokenizers 0.15.0