---
base_model: ai-forever/rugpt3large_based_on_gpt2
tags:
- generated_from_trainer
model-index:
- name: laws_rugpt3medium_finetune
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# laws_rugpt3medium_finetune

This model is a fine-tuned version of [ai-forever/rugpt3large_based_on_gpt2](https://huggingface.co/ai-forever/rugpt3large_based_on_gpt2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4051

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 3
- total_train_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1000
- num_epochs: 30
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 3.3772        | 0.23  | 25   | 3.3796          |
| 3.4598        | 0.46  | 50   | 3.3744          |
| 3.3981        | 0.69  | 75   | 3.3587          |
| 3.4916        | 0.93  | 100  | 3.3322          |
| 3.4166        | 1.16  | 125  | 3.2980          |
| 3.3829        | 1.39  | 150  | 3.2626          |
| 3.2992        | 1.62  | 175  | 3.2285          |
| 3.3237        | 1.85  | 200  | 3.1936          |
| 3.2106        | 2.08  | 225  | 3.1601          |
| 3.1947        | 2.31  | 250  | 3.1311          |
| 3.2183        | 2.55  | 275  | 3.0988          |
| 3.2124        | 2.78  | 300  | 3.0620          |
| 3.1725        | 3.01  | 325  | 3.0266          |
| 3.078         | 3.24  | 350  | 2.9931          |
| 3.0387        | 3.47  | 375  | 2.9595          |
| 3.0944        | 3.7   | 400  | 2.9194          |
| 3.049         | 3.94  | 425  | 2.8818          |
| 2.9818        | 4.17  | 450  | 2.8438          |
| 2.9278        | 4.4   | 475  | 2.8074          |
| 2.9172        | 4.63  | 500  | 2.7671          |
| 2.8432        | 4.86  | 525  | 2.7233          |
| 2.8499        | 5.09  | 550  | 2.6794          |
| 2.76          | 5.32  | 575  | 2.6310          |
| 2.7197        | 5.56  | 600  | 2.5857          |
| 2.793         | 5.79  | 625  | 2.5458          |
| 2.6895        | 6.02  | 650  | 2.4991          |
| 2.651         | 6.25  | 675  | 2.4496          |
| 2.5484        | 6.48  | 700  | 2.4014          |
| 2.5728        | 6.71  | 725  | 2.3471          |
| 2.4865        | 6.94  | 750  | 2.2953          |
| 2.4388        | 7.18  | 775  | 2.2369          |
| 2.4137        | 7.41  | 800  | 2.1799          |
| 2.3262        | 7.64  | 825  | 2.1285          |
| 2.3043        | 7.87  | 850  | 2.0836          |
| 2.2541        | 8.1   | 875  | 2.0299          |
| 2.1348        | 8.33  | 900  | 1.9730          |
| 2.1904        | 8.56  | 925  | 1.9211          |
| 2.0869        | 8.8   | 950  | 1.8719          |
| 2.1606        | 9.03  | 975  | 1.8210          |
| 1.9323        | 9.26  | 1000 | 1.7712          |
| 1.9892        | 9.49  | 1025 | 1.7254          |
| 1.9407        | 9.72  | 1050 | 1.6757          |
| 1.8791        | 9.95  | 1075 | 1.6214          |
| 1.7791        | 10.19 | 1100 | 1.5702          |
| 1.7523        | 10.42 | 1125 | 1.5284          |
| 1.7336        | 10.65 | 1150 | 1.4912          |
| 1.7709        | 10.88 | 1175 | 1.4475          |
| 1.6533        | 11.11 | 1200 | 1.3941          |
| 1.5671        | 11.34 | 1225 | 1.3536          |
| 1.5394        | 11.57 | 1250 | 1.3209          |
| 1.6085        | 11.81 | 1275 | 1.2921          |
| 1.5465        | 12.04 | 1300 | 1.2599          |
| 1.4172        | 12.27 | 1325 | 1.2292          |
| 1.4422        | 12.5  | 1350 | 1.1927          |
| 1.4708        | 12.73 | 1375 | 1.1563          |
| 1.3859        | 12.96 | 1400 | 1.1260          |
| 1.2036        | 13.19 | 1425 | 1.0932          |
| 1.3393        | 13.43 | 1450 | 1.0697          |
| 1.3203        | 13.66 | 1475 | 1.0376          |
| 1.2902        | 13.89 | 1500 | 1.0084          |
| 1.2356        | 14.12 | 1525 | 0.9760          |
| 1.2329        | 14.35 | 1550 | 0.9531          |
| 1.2039        | 14.58 | 1575 | 0.9343          |
| 1.1521        | 14.81 | 1600 | 0.9084          |
| 1.0754        | 15.05 | 1625 | 0.8786          |
| 1.0786        | 15.28 | 1650 | 0.8620          |
| 1.1052        | 15.51 | 1675 | 0.8395          |
| 1.0765        | 15.74 | 1700 | 0.8192          |
| 1.0817        | 15.97 | 1725 | 0.8002          |
| 1.0285        | 16.2  | 1750 | 0.7715          |
| 1.0313        | 16.44 | 1775 | 0.7612          |
| 0.9682        | 16.67 | 1800 | 0.7458          |
| 1.0025        | 16.9  | 1825 | 0.7267          |
| 0.9516        | 17.13 | 1850 | 0.7052          |
| 0.9475        | 17.36 | 1875 | 0.6952          |
| 0.8851        | 17.59 | 1900 | 0.6745          |
| 0.9463        | 17.82 | 1925 | 0.6602          |
| 0.8937        | 18.06 | 1950 | 0.6436          |
| 0.8135        | 18.29 | 1975 | 0.6316          |
| 0.8738        | 18.52 | 2000 | 0.6172          |
| 0.8585        | 18.75 | 2025 | 0.6072          |
| 0.8782        | 18.98 | 2050 | 0.5968          |
| 0.8324        | 19.21 | 2075 | 0.5789          |
| 0.7818        | 19.44 | 2100 | 0.5688          |
| 0.8375        | 19.68 | 2125 | 0.5602          |
| 0.7838        | 19.91 | 2150 | 0.5498          |
| 0.8015        | 20.14 | 2175 | 0.5369          |
| 0.724         | 20.37 | 2200 | 0.5299          |
| 0.7298        | 20.6  | 2225 | 0.5233          |
| 0.8079        | 20.83 | 2250 | 0.5141          |
| 0.77          | 21.06 | 2275 | 0.5058          |
| 0.7299        | 21.3  | 2300 | 0.4995          |
| 0.7152        | 21.53 | 2325 | 0.4893          |
| 0.6905        | 21.76 | 2350 | 0.4882          |
| 0.7492        | 21.99 | 2375 | 0.4779          |
| 0.6817        | 22.22 | 2400 | 0.4681          |
| 0.6893        | 22.45 | 2425 | 0.4652          |
| 0.7098        | 22.69 | 2450 | 0.4611          |
| 0.7063        | 22.92 | 2475 | 0.4582          |
| 0.6562        | 23.15 | 2500 | 0.4511          |
| 0.7083        | 23.38 | 2525 | 0.4474          |
| 0.6684        | 23.61 | 2550 | 0.4438          |
| 0.6688        | 23.84 | 2575 | 0.4398          |
| 0.6561        | 24.07 | 2600 | 0.4334          |
| 0.6664        | 24.31 | 2625 | 0.4318          |
| 0.6418        | 24.54 | 2650 | 0.4294          |
| 0.6723        | 24.77 | 2675 | 0.4249          |
| 0.6164        | 25.0  | 2700 | 0.4215          |
| 0.6348        | 25.23 | 2725 | 0.4203          |
| 0.6464        | 25.46 | 2750 | 0.4182          |
| 0.6392        | 25.69 | 2775 | 0.4171          |
| 0.6186        | 25.93 | 2800 | 0.4156          |
| 0.6447        | 26.16 | 2825 | 0.4138          |
| 0.6445        | 26.39 | 2850 | 0.4114          |
| 0.6037        | 26.62 | 2875 | 0.4109          |
| 0.6074        | 26.85 | 2900 | 0.4099          |
| 0.6509        | 27.08 | 2925 | 0.4092          |
| 0.6416        | 27.31 | 2950 | 0.4082          |
| 0.6391        | 27.55 | 2975 | 0.4075          |
| 0.594         | 27.78 | 3000 | 0.4071          |
| 0.6231        | 28.01 | 3025 | 0.4066          |
| 0.6151        | 28.24 | 3050 | 0.4061          |
| 0.6464        | 28.47 | 3075 | 0.4056          |
| 0.6024        | 28.7  | 3100 | 0.4054          |
| 0.6277        | 28.94 | 3125 | 0.4052          |
| 0.6017        | 29.17 | 3150 | 0.4052          |
| 0.6226        | 29.4  | 3175 | 0.4051          |
| 0.6084        | 29.63 | 3200 | 0.4051          |
| 0.639         | 29.86 | 3225 | 0.4051          |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.0
- Tokenizers 0.15.0