mixtralyanis's picture
End of training
1cbc1c9 verified
|
raw
history blame
19.6 kB
metadata
license: apache-2.0
base_model: google/flan-t5-small
tags:
  - generated_from_trainer
model-index:
  - name: flant5-tuned-15-warmup
    results: []

flant5-tuned-15-warmup

This model is a fine-tuned version of google/flan-t5-small on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8530

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 3
  • num_epochs: 15

Training results

Training Loss Epoch Step Validation Loss
2.1664 0.04 1 1.8055
2.3197 0.08 2 1.6900
1.7394 0.12 3 1.5971
1.7463 0.17 4 1.5465
1.998 0.21 5 1.4992
1.7127 0.25 6 1.4731
1.4435 0.29 7 1.4587
1.5656 0.33 8 1.4509
1.8111 0.38 9 1.4381
1.7371 0.42 10 1.4286
1.3207 0.46 11 1.4276
1.4292 0.5 12 1.4301
1.7736 0.54 13 1.4260
1.628 0.58 14 1.4183
2.0071 0.62 15 1.4017
1.2926 0.67 16 1.3891
1.5699 0.71 17 1.3769
1.5532 0.75 18 1.3664
1.4906 0.79 19 1.3580
1.6887 0.83 20 1.3502
1.1568 0.88 21 1.3462
1.3267 0.92 22 1.3443
1.4873 0.96 23 1.3447
2.3224 1.0 24 1.3463
1.0034 1.04 25 1.3531
1.0839 1.08 26 1.3627
1.4106 1.12 27 1.3660
1.3157 1.17 28 1.3671
1.3775 1.21 29 1.3600
1.0208 1.25 30 1.3538
1.0619 1.29 31 1.3459
1.5116 1.33 32 1.3366
1.242 1.38 33 1.3282
1.3132 1.42 34 1.3204
1.2549 1.46 35 1.3152
1.1817 1.5 36 1.3105
1.3416 1.54 37 1.3068
1.1483 1.58 38 1.3078
1.2318 1.62 39 1.3092
1.2906 1.67 40 1.3128
1.2147 1.71 41 1.3175
1.184 1.75 42 1.3222
1.1648 1.79 43 1.3295
1.3964 1.83 44 1.3331
1.1426 1.88 45 1.3380
1.3105 1.92 46 1.3378
0.8797 1.96 47 1.3375
0.9167 2.0 48 1.3380
1.0769 2.04 49 1.3368
1.1431 2.08 50 1.3392
1.0875 2.12 51 1.3423
0.8795 2.17 52 1.3458
0.9912 2.21 53 1.3484
0.9544 2.25 54 1.3505
0.9606 2.29 55 1.3499
1.0507 2.33 56 1.3507
0.7372 2.38 57 1.3561
0.9674 2.42 58 1.3612
0.8529 2.46 59 1.3708
0.9837 2.5 60 1.3743
1.216 2.54 61 1.3748
1.0467 2.58 62 1.3729
0.7987 2.62 63 1.3726
1.0887 2.67 64 1.3711
0.873 2.71 65 1.3692
1.0544 2.75 66 1.3665
0.9743 2.79 67 1.3603
0.9914 2.83 68 1.3517
1.2247 2.88 69 1.3429
1.1727 2.92 70 1.3338
0.9567 2.96 71 1.3263
1.365 3.0 72 1.3198
0.8628 3.04 73 1.3183
0.8445 3.08 74 1.3189
0.9465 3.12 75 1.3212
0.8632 3.17 76 1.3263
0.9354 3.21 77 1.3327
0.9771 3.25 78 1.3410
0.9031 3.29 79 1.3495
0.8991 3.33 80 1.3637
0.7454 3.38 81 1.3819
0.8611 3.42 82 1.3995
0.947 3.46 83 1.4129
0.9534 3.5 84 1.4179
0.6619 3.54 85 1.4242
0.818 3.58 86 1.4292
0.9969 3.62 87 1.4276
0.8368 3.67 88 1.4218
0.7178 3.71 89 1.4149
0.9593 3.75 90 1.4088
0.859 3.79 91 1.4045
0.812 3.83 92 1.4036
0.6806 3.88 93 1.4027
0.9109 3.92 94 1.4005
0.9001 3.96 95 1.3973
0.6846 4.0 96 1.3926
0.8173 4.04 97 1.3904
0.7139 4.08 98 1.3892
0.6313 4.12 99 1.3912
0.8807 4.17 100 1.3924
0.6199 4.21 101 1.3957
0.7954 4.25 102 1.4024
0.7597 4.29 103 1.4106
0.6427 4.33 104 1.4211
0.7058 4.38 105 1.4329
0.6692 4.42 106 1.4447
0.6667 4.46 107 1.4570
0.7355 4.5 108 1.4674
0.8167 4.54 109 1.4707
0.81 4.58 110 1.4693
0.6473 4.62 111 1.4679
0.8049 4.67 112 1.4601
0.9267 4.71 113 1.4509
0.8384 4.75 114 1.4432
0.7355 4.79 115 1.4403
0.6937 4.83 116 1.4408
0.7224 4.88 117 1.4420
0.7238 4.92 118 1.4430
0.7761 4.96 119 1.4447
0.7301 5.0 120 1.4480
0.7274 5.04 121 1.4519
0.7926 5.08 122 1.4538
0.6853 5.12 123 1.4560
0.6107 5.17 124 1.4566
0.6582 5.21 125 1.4632
0.6247 5.25 126 1.4690
0.7176 5.29 127 1.4753
0.6065 5.33 128 1.4830
0.7235 5.38 129 1.4899
0.5553 5.42 130 1.4971
0.8425 5.46 131 1.4979
0.5815 5.5 132 1.4979
0.7062 5.54 133 1.4974
0.8297 5.58 134 1.4970
0.7131 5.62 135 1.4955
0.5478 5.67 136 1.4936
0.6922 5.71 137 1.4934
0.6594 5.75 138 1.4907
0.5619 5.79 139 1.4927
0.6018 5.83 140 1.4929
0.6148 5.88 141 1.4916
0.6707 5.92 142 1.4909
0.5154 5.96 143 1.4926
0.5522 6.0 144 1.4946
0.5371 6.04 145 1.4975
0.6454 6.08 146 1.5007
0.6201 6.12 147 1.5032
0.5702 6.17 148 1.5058
0.6221 6.21 149 1.5079
0.4378 6.25 150 1.5155
0.5449 6.29 151 1.5235
0.5292 6.33 152 1.5350
0.5256 6.38 153 1.5444
0.6434 6.42 154 1.5502
0.6314 6.46 155 1.5571
0.4914 6.5 156 1.5641
0.5439 6.54 157 1.5684
0.6915 6.58 158 1.5691
0.5191 6.62 159 1.5699
0.4984 6.67 160 1.5670
0.6393 6.71 161 1.5607
0.5726 6.75 162 1.5528
0.6291 6.79 163 1.5439
0.4941 6.83 164 1.5371
0.5799 6.88 165 1.5296
0.7036 6.92 166 1.5232
0.6837 6.96 167 1.5164
0.5574 7.0 168 1.5122
0.497 7.04 169 1.5108
0.5958 7.08 170 1.5129
0.5133 7.12 171 1.5198
0.6301 7.17 172 1.5272
0.4655 7.21 173 1.5372
0.5553 7.25 174 1.5469
0.4758 7.29 175 1.5582
0.5378 7.33 176 1.5684
0.558 7.38 177 1.5755
0.4848 7.42 178 1.5852
0.5505 7.46 179 1.5976
0.4846 7.5 180 1.6082
0.5397 7.54 181 1.6155
0.5573 7.58 182 1.6235
0.5175 7.62 183 1.6294
0.5925 7.67 184 1.6321
0.3948 7.71 185 1.6322
0.5051 7.75 186 1.6277
0.5424 7.79 187 1.6222
0.4412 7.83 188 1.6176
0.5366 7.88 189 1.6116
0.634 7.92 190 1.6057
0.5751 7.96 191 1.6005
0.6559 8.0 192 1.5957
0.6267 8.04 193 1.5927
0.4963 8.08 194 1.5911
0.4596 8.12 195 1.5910
0.4256 8.17 196 1.5943
0.3317 8.21 197 1.5992
0.4947 8.25 198 1.6035
0.3818 8.29 199 1.6093
0.4051 8.33 200 1.6160
0.4711 8.38 201 1.6231
0.5212 8.42 202 1.6314
0.4093 8.46 203 1.6411
0.5262 8.5 204 1.6498
0.5127 8.54 205 1.6546
0.5028 8.58 206 1.6579
0.4227 8.62 207 1.6578
0.5271 8.67 208 1.6553
0.435 8.71 209 1.6515
0.4956 8.75 210 1.6431
0.4213 8.79 211 1.6347
0.4663 8.83 212 1.6299
0.4648 8.88 213 1.6274
0.4139 8.92 214 1.6277
0.379 8.96 215 1.6322
0.6425 9.0 216 1.6377
0.3229 9.04 217 1.6466
0.3898 9.08 218 1.6573
0.4358 9.12 219 1.6648
0.4467 9.17 220 1.6753
0.3967 9.21 221 1.6837
0.2943 9.25 222 1.6931
0.4568 9.29 223 1.7022
0.4703 9.33 224 1.7088
0.392 9.38 225 1.7163
0.3981 9.42 226 1.7242
0.4152 9.46 227 1.7296
0.4763 9.5 228 1.7316
0.5617 9.54 229 1.7306
0.3502 9.58 230 1.7279
0.4049 9.62 231 1.7266
0.4626 9.67 232 1.7273
0.4545 9.71 233 1.7270
0.5877 9.75 234 1.7228
0.3715 9.79 235 1.7186
0.4231 9.83 236 1.7190
0.504 9.88 237 1.7194
0.503 9.92 238 1.7198
0.4163 9.96 239 1.7214
0.5041 10.0 240 1.7209
0.3248 10.04 241 1.7225
0.4198 10.08 242 1.7254
0.4131 10.12 243 1.7285
0.4489 10.17 244 1.7319
0.3786 10.21 245 1.7381
0.3118 10.25 246 1.7451
0.3234 10.29 247 1.7512
0.4524 10.33 248 1.7555
0.3746 10.38 249 1.7594
0.3793 10.42 250 1.7621
0.4452 10.46 251 1.7607
0.3982 10.5 252 1.7587
0.4032 10.54 253 1.7575
0.3566 10.58 254 1.7554
0.2845 10.62 255 1.7543
0.3976 10.67 256 1.7541
0.2432 10.71 257 1.7560
0.4435 10.75 258 1.7565
0.4078 10.79 259 1.7548
0.311 10.83 260 1.7526
0.3831 10.88 261 1.7492
0.3951 10.92 262 1.7454
0.341 10.96 263 1.7432
0.4266 11.0 264 1.7418
0.4136 11.04 265 1.7392
0.3152 11.08 266 1.7390
0.4412 11.12 267 1.7394
0.3114 11.17 268 1.7420
0.4265 11.21 269 1.7457
0.3329 11.25 270 1.7515
0.3281 11.29 271 1.7568
0.4069 11.33 272 1.7607
0.3584 11.38 273 1.7639
0.3881 11.42 274 1.7664
0.4184 11.46 275 1.7682
0.3187 11.5 276 1.7692
0.389 11.54 277 1.7681
0.4005 11.58 278 1.7683
0.3346 11.62 279 1.7709
0.4174 11.67 280 1.7725
0.342 11.71 281 1.7761
0.3049 11.75 282 1.7794
0.2549 11.79 283 1.7822
0.3893 11.83 284 1.7849
0.3414 11.88 285 1.7884
0.2519 11.92 286 1.7923
0.4489 11.96 287 1.7948
0.2376 12.0 288 1.7967
0.389 12.04 289 1.7964
0.314 12.08 290 1.7963
0.2555 12.12 291 1.7963
0.3484 12.17 292 1.7958
0.345 12.21 293 1.7954
0.3589 12.25 294 1.7945
0.2968 12.29 295 1.7940
0.272 12.33 296 1.7947
0.3022 12.38 297 1.7989
0.4437 12.42 298 1.8041
0.2962 12.46 299 1.8072
0.3557 12.5 300 1.8098
0.2633 12.54 301 1.8131
0.3529 12.58 302 1.8152
0.3046 12.62 303 1.8172
0.3453 12.67 304 1.8187
0.367 12.71 305 1.8210
0.3454 12.75 306 1.8236
0.3081 12.79 307 1.8248
0.4073 12.83 308 1.8256
0.2942 12.88 309 1.8258
0.3029 12.92 310 1.8262
0.3874 12.96 311 1.8276
0.3773 13.0 312 1.8287
0.3005 13.04 313 1.8303
0.2496 13.08 314 1.8318
0.3743 13.12 315 1.8323
0.2532 13.17 316 1.8333
0.3451 13.21 317 1.8346
0.3743 13.25 318 1.8349
0.3334 13.29 319 1.8350
0.2377 13.33 320 1.8362
0.2977 13.38 321 1.8379
0.2956 13.42 322 1.8396
0.3779 13.46 323 1.8406
0.3397 13.5 324 1.8411
0.3656 13.54 325 1.8414
0.35 13.58 326 1.8407
0.3831 13.62 327 1.8404
0.3935 13.67 328 1.8391
0.3442 13.71 329 1.8381
0.33 13.75 330 1.8376
0.2586 13.79 331 1.8375
0.349 13.83 332 1.8378
0.3148 13.88 333 1.8381
0.2371 13.92 334 1.8386
0.3977 13.96 335 1.8388
0.2817 14.0 336 1.8391
0.258 14.04 337 1.8396
0.3157 14.08 338 1.8405
0.2292 14.12 339 1.8413
0.3611 14.17 340 1.8421
0.3431 14.21 341 1.8432
0.2391 14.25 342 1.8445
0.2542 14.29 343 1.8459
0.3984 14.33 344 1.8469
0.2329 14.38 345 1.8480
0.3375 14.42 346 1.8486
0.2692 14.46 347 1.8491
0.4525 14.5 348 1.8495
0.3417 14.54 349 1.8499
0.3488 14.58 350 1.8502
0.3834 14.62 351 1.8504
0.269 14.67 352 1.8507
0.2713 14.71 353 1.8510
0.2776 14.75 354 1.8514
0.3643 14.79 355 1.8518
0.3205 14.83 356 1.8521
0.305 14.88 357 1.8525
0.2691 14.92 358 1.8527
0.3422 14.96 359 1.8529
0.2489 15.0 360 1.8530

Framework versions

  • Transformers 4.38.1
  • Pytorch 2.1.0+cu121
  • Datasets 2.17.0
  • Tokenizers 0.15.2