--- license: apache-2.0 base_model: google/flan-t5-small tags: - generated_from_trainer model-index: - name: flant5-tuned-15-warmup results: [] --- # flant5-tuned-15-warmup This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on the None dataset. It achieves the following results on the evaluation set: - Loss: 1.8530 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0005 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 3 - num_epochs: 15 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 2.1664 | 0.04 | 1 | 1.8055 | | 2.3197 | 0.08 | 2 | 1.6900 | | 1.7394 | 0.12 | 3 | 1.5971 | | 1.7463 | 0.17 | 4 | 1.5465 | | 1.998 | 0.21 | 5 | 1.4992 | | 1.7127 | 0.25 | 6 | 1.4731 | | 1.4435 | 0.29 | 7 | 1.4587 | | 1.5656 | 0.33 | 8 | 1.4509 | | 1.8111 | 0.38 | 9 | 1.4381 | | 1.7371 | 0.42 | 10 | 1.4286 | | 1.3207 | 0.46 | 11 | 1.4276 | | 1.4292 | 0.5 | 12 | 1.4301 | | 1.7736 | 0.54 | 13 | 1.4260 | | 1.628 | 0.58 | 14 | 1.4183 | | 2.0071 | 0.62 | 15 | 1.4017 | | 1.2926 | 0.67 | 16 | 1.3891 | | 1.5699 | 0.71 | 17 | 1.3769 | | 1.5532 | 0.75 | 18 | 1.3664 | | 1.4906 | 0.79 | 19 | 1.3580 | | 1.6887 | 0.83 | 20 | 1.3502 | | 1.1568 | 0.88 | 21 | 1.3462 | | 1.3267 | 0.92 | 22 | 1.3443 | | 1.4873 | 0.96 | 23 | 1.3447 | | 2.3224 | 1.0 | 24 | 1.3463 | | 1.0034 | 1.04 | 25 | 1.3531 | | 1.0839 | 1.08 | 26 | 1.3627 | | 1.4106 | 1.12 | 27 | 1.3660 | | 1.3157 | 1.17 | 28 | 1.3671 | | 1.3775 | 1.21 | 29 | 1.3600 | | 1.0208 | 1.25 | 30 | 1.3538 | | 1.0619 | 1.29 | 31 | 1.3459 | | 1.5116 | 1.33 | 32 | 1.3366 | | 1.242 | 1.38 | 33 | 1.3282 | | 1.3132 | 1.42 | 34 | 1.3204 | | 1.2549 | 1.46 | 35 | 1.3152 | | 1.1817 | 1.5 | 36 | 1.3105 | | 1.3416 | 1.54 | 37 | 1.3068 | | 1.1483 | 1.58 | 38 | 1.3078 | | 1.2318 | 1.62 | 39 | 1.3092 | | 1.2906 | 1.67 | 40 | 1.3128 | | 1.2147 | 1.71 | 41 | 1.3175 | | 1.184 | 1.75 | 42 | 1.3222 | | 1.1648 | 1.79 | 43 | 1.3295 | | 1.3964 | 1.83 | 44 | 1.3331 | | 1.1426 | 1.88 | 45 | 1.3380 | | 1.3105 | 1.92 | 46 | 1.3378 | | 0.8797 | 1.96 | 47 | 1.3375 | | 0.9167 | 2.0 | 48 | 1.3380 | | 1.0769 | 2.04 | 49 | 1.3368 | | 1.1431 | 2.08 | 50 | 1.3392 | | 1.0875 | 2.12 | 51 | 1.3423 | | 0.8795 | 2.17 | 52 | 1.3458 | | 0.9912 | 2.21 | 53 | 1.3484 | | 0.9544 | 2.25 | 54 | 1.3505 | | 0.9606 | 2.29 | 55 | 1.3499 | | 1.0507 | 2.33 | 56 | 1.3507 | | 0.7372 | 2.38 | 57 | 1.3561 | | 0.9674 | 2.42 | 58 | 1.3612 | | 0.8529 | 2.46 | 59 | 1.3708 | | 0.9837 | 2.5 | 60 | 1.3743 | | 1.216 | 2.54 | 61 | 1.3748 | | 1.0467 | 2.58 | 62 | 1.3729 | | 0.7987 | 2.62 | 63 | 1.3726 | | 1.0887 | 2.67 | 64 | 1.3711 | | 0.873 | 2.71 | 65 | 1.3692 | | 1.0544 | 2.75 | 66 | 1.3665 | | 0.9743 | 2.79 | 67 | 1.3603 | | 0.9914 | 2.83 | 68 | 1.3517 | | 1.2247 | 2.88 | 69 | 1.3429 | | 1.1727 | 2.92 | 70 | 1.3338 | | 0.9567 | 2.96 | 71 | 1.3263 | | 1.365 | 3.0 | 72 | 1.3198 | | 0.8628 | 3.04 | 73 | 1.3183 | | 0.8445 | 3.08 | 74 | 1.3189 | | 0.9465 | 3.12 | 75 | 1.3212 | | 0.8632 | 3.17 | 76 | 1.3263 | | 0.9354 | 3.21 | 77 | 1.3327 | | 0.9771 | 3.25 | 78 | 1.3410 | | 0.9031 | 3.29 | 79 | 1.3495 | | 0.8991 | 3.33 | 80 | 1.3637 | | 0.7454 | 3.38 | 81 | 1.3819 | | 0.8611 | 3.42 | 82 | 1.3995 | | 0.947 | 3.46 | 83 | 1.4129 | | 0.9534 | 3.5 | 84 | 1.4179 | | 0.6619 | 3.54 | 85 | 1.4242 | | 0.818 | 3.58 | 86 | 1.4292 | | 0.9969 | 3.62 | 87 | 1.4276 | | 0.8368 | 3.67 | 88 | 1.4218 | | 0.7178 | 3.71 | 89 | 1.4149 | | 0.9593 | 3.75 | 90 | 1.4088 | | 0.859 | 3.79 | 91 | 1.4045 | | 0.812 | 3.83 | 92 | 1.4036 | | 0.6806 | 3.88 | 93 | 1.4027 | | 0.9109 | 3.92 | 94 | 1.4005 | | 0.9001 | 3.96 | 95 | 1.3973 | | 0.6846 | 4.0 | 96 | 1.3926 | | 0.8173 | 4.04 | 97 | 1.3904 | | 0.7139 | 4.08 | 98 | 1.3892 | | 0.6313 | 4.12 | 99 | 1.3912 | | 0.8807 | 4.17 | 100 | 1.3924 | | 0.6199 | 4.21 | 101 | 1.3957 | | 0.7954 | 4.25 | 102 | 1.4024 | | 0.7597 | 4.29 | 103 | 1.4106 | | 0.6427 | 4.33 | 104 | 1.4211 | | 0.7058 | 4.38 | 105 | 1.4329 | | 0.6692 | 4.42 | 106 | 1.4447 | | 0.6667 | 4.46 | 107 | 1.4570 | | 0.7355 | 4.5 | 108 | 1.4674 | | 0.8167 | 4.54 | 109 | 1.4707 | | 0.81 | 4.58 | 110 | 1.4693 | | 0.6473 | 4.62 | 111 | 1.4679 | | 0.8049 | 4.67 | 112 | 1.4601 | | 0.9267 | 4.71 | 113 | 1.4509 | | 0.8384 | 4.75 | 114 | 1.4432 | | 0.7355 | 4.79 | 115 | 1.4403 | | 0.6937 | 4.83 | 116 | 1.4408 | | 0.7224 | 4.88 | 117 | 1.4420 | | 0.7238 | 4.92 | 118 | 1.4430 | | 0.7761 | 4.96 | 119 | 1.4447 | | 0.7301 | 5.0 | 120 | 1.4480 | | 0.7274 | 5.04 | 121 | 1.4519 | | 0.7926 | 5.08 | 122 | 1.4538 | | 0.6853 | 5.12 | 123 | 1.4560 | | 0.6107 | 5.17 | 124 | 1.4566 | | 0.6582 | 5.21 | 125 | 1.4632 | | 0.6247 | 5.25 | 126 | 1.4690 | | 0.7176 | 5.29 | 127 | 1.4753 | | 0.6065 | 5.33 | 128 | 1.4830 | | 0.7235 | 5.38 | 129 | 1.4899 | | 0.5553 | 5.42 | 130 | 1.4971 | | 0.8425 | 5.46 | 131 | 1.4979 | | 0.5815 | 5.5 | 132 | 1.4979 | | 0.7062 | 5.54 | 133 | 1.4974 | | 0.8297 | 5.58 | 134 | 1.4970 | | 0.7131 | 5.62 | 135 | 1.4955 | | 0.5478 | 5.67 | 136 | 1.4936 | | 0.6922 | 5.71 | 137 | 1.4934 | | 0.6594 | 5.75 | 138 | 1.4907 | | 0.5619 | 5.79 | 139 | 1.4927 | | 0.6018 | 5.83 | 140 | 1.4929 | | 0.6148 | 5.88 | 141 | 1.4916 | | 0.6707 | 5.92 | 142 | 1.4909 | | 0.5154 | 5.96 | 143 | 1.4926 | | 0.5522 | 6.0 | 144 | 1.4946 | | 0.5371 | 6.04 | 145 | 1.4975 | | 0.6454 | 6.08 | 146 | 1.5007 | | 0.6201 | 6.12 | 147 | 1.5032 | | 0.5702 | 6.17 | 148 | 1.5058 | | 0.6221 | 6.21 | 149 | 1.5079 | | 0.4378 | 6.25 | 150 | 1.5155 | | 0.5449 | 6.29 | 151 | 1.5235 | | 0.5292 | 6.33 | 152 | 1.5350 | | 0.5256 | 6.38 | 153 | 1.5444 | | 0.6434 | 6.42 | 154 | 1.5502 | | 0.6314 | 6.46 | 155 | 1.5571 | | 0.4914 | 6.5 | 156 | 1.5641 | | 0.5439 | 6.54 | 157 | 1.5684 | | 0.6915 | 6.58 | 158 | 1.5691 | | 0.5191 | 6.62 | 159 | 1.5699 | | 0.4984 | 6.67 | 160 | 1.5670 | | 0.6393 | 6.71 | 161 | 1.5607 | | 0.5726 | 6.75 | 162 | 1.5528 | | 0.6291 | 6.79 | 163 | 1.5439 | | 0.4941 | 6.83 | 164 | 1.5371 | | 0.5799 | 6.88 | 165 | 1.5296 | | 0.7036 | 6.92 | 166 | 1.5232 | | 0.6837 | 6.96 | 167 | 1.5164 | | 0.5574 | 7.0 | 168 | 1.5122 | | 0.497 | 7.04 | 169 | 1.5108 | | 0.5958 | 7.08 | 170 | 1.5129 | | 0.5133 | 7.12 | 171 | 1.5198 | | 0.6301 | 7.17 | 172 | 1.5272 | | 0.4655 | 7.21 | 173 | 1.5372 | | 0.5553 | 7.25 | 174 | 1.5469 | | 0.4758 | 7.29 | 175 | 1.5582 | | 0.5378 | 7.33 | 176 | 1.5684 | | 0.558 | 7.38 | 177 | 1.5755 | | 0.4848 | 7.42 | 178 | 1.5852 | | 0.5505 | 7.46 | 179 | 1.5976 | | 0.4846 | 7.5 | 180 | 1.6082 | | 0.5397 | 7.54 | 181 | 1.6155 | | 0.5573 | 7.58 | 182 | 1.6235 | | 0.5175 | 7.62 | 183 | 1.6294 | | 0.5925 | 7.67 | 184 | 1.6321 | | 0.3948 | 7.71 | 185 | 1.6322 | | 0.5051 | 7.75 | 186 | 1.6277 | | 0.5424 | 7.79 | 187 | 1.6222 | | 0.4412 | 7.83 | 188 | 1.6176 | | 0.5366 | 7.88 | 189 | 1.6116 | | 0.634 | 7.92 | 190 | 1.6057 | | 0.5751 | 7.96 | 191 | 1.6005 | | 0.6559 | 8.0 | 192 | 1.5957 | | 0.6267 | 8.04 | 193 | 1.5927 | | 0.4963 | 8.08 | 194 | 1.5911 | | 0.4596 | 8.12 | 195 | 1.5910 | | 0.4256 | 8.17 | 196 | 1.5943 | | 0.3317 | 8.21 | 197 | 1.5992 | | 0.4947 | 8.25 | 198 | 1.6035 | | 0.3818 | 8.29 | 199 | 1.6093 | | 0.4051 | 8.33 | 200 | 1.6160 | | 0.4711 | 8.38 | 201 | 1.6231 | | 0.5212 | 8.42 | 202 | 1.6314 | | 0.4093 | 8.46 | 203 | 1.6411 | | 0.5262 | 8.5 | 204 | 1.6498 | | 0.5127 | 8.54 | 205 | 1.6546 | | 0.5028 | 8.58 | 206 | 1.6579 | | 0.4227 | 8.62 | 207 | 1.6578 | | 0.5271 | 8.67 | 208 | 1.6553 | | 0.435 | 8.71 | 209 | 1.6515 | | 0.4956 | 8.75 | 210 | 1.6431 | | 0.4213 | 8.79 | 211 | 1.6347 | | 0.4663 | 8.83 | 212 | 1.6299 | | 0.4648 | 8.88 | 213 | 1.6274 | | 0.4139 | 8.92 | 214 | 1.6277 | | 0.379 | 8.96 | 215 | 1.6322 | | 0.6425 | 9.0 | 216 | 1.6377 | | 0.3229 | 9.04 | 217 | 1.6466 | | 0.3898 | 9.08 | 218 | 1.6573 | | 0.4358 | 9.12 | 219 | 1.6648 | | 0.4467 | 9.17 | 220 | 1.6753 | | 0.3967 | 9.21 | 221 | 1.6837 | | 0.2943 | 9.25 | 222 | 1.6931 | | 0.4568 | 9.29 | 223 | 1.7022 | | 0.4703 | 9.33 | 224 | 1.7088 | | 0.392 | 9.38 | 225 | 1.7163 | | 0.3981 | 9.42 | 226 | 1.7242 | | 0.4152 | 9.46 | 227 | 1.7296 | | 0.4763 | 9.5 | 228 | 1.7316 | | 0.5617 | 9.54 | 229 | 1.7306 | | 0.3502 | 9.58 | 230 | 1.7279 | | 0.4049 | 9.62 | 231 | 1.7266 | | 0.4626 | 9.67 | 232 | 1.7273 | | 0.4545 | 9.71 | 233 | 1.7270 | | 0.5877 | 9.75 | 234 | 1.7228 | | 0.3715 | 9.79 | 235 | 1.7186 | | 0.4231 | 9.83 | 236 | 1.7190 | | 0.504 | 9.88 | 237 | 1.7194 | | 0.503 | 9.92 | 238 | 1.7198 | | 0.4163 | 9.96 | 239 | 1.7214 | | 0.5041 | 10.0 | 240 | 1.7209 | | 0.3248 | 10.04 | 241 | 1.7225 | | 0.4198 | 10.08 | 242 | 1.7254 | | 0.4131 | 10.12 | 243 | 1.7285 | | 0.4489 | 10.17 | 244 | 1.7319 | | 0.3786 | 10.21 | 245 | 1.7381 | | 0.3118 | 10.25 | 246 | 1.7451 | | 0.3234 | 10.29 | 247 | 1.7512 | | 0.4524 | 10.33 | 248 | 1.7555 | | 0.3746 | 10.38 | 249 | 1.7594 | | 0.3793 | 10.42 | 250 | 1.7621 | | 0.4452 | 10.46 | 251 | 1.7607 | | 0.3982 | 10.5 | 252 | 1.7587 | | 0.4032 | 10.54 | 253 | 1.7575 | | 0.3566 | 10.58 | 254 | 1.7554 | | 0.2845 | 10.62 | 255 | 1.7543 | | 0.3976 | 10.67 | 256 | 1.7541 | | 0.2432 | 10.71 | 257 | 1.7560 | | 0.4435 | 10.75 | 258 | 1.7565 | | 0.4078 | 10.79 | 259 | 1.7548 | | 0.311 | 10.83 | 260 | 1.7526 | | 0.3831 | 10.88 | 261 | 1.7492 | | 0.3951 | 10.92 | 262 | 1.7454 | | 0.341 | 10.96 | 263 | 1.7432 | | 0.4266 | 11.0 | 264 | 1.7418 | | 0.4136 | 11.04 | 265 | 1.7392 | | 0.3152 | 11.08 | 266 | 1.7390 | | 0.4412 | 11.12 | 267 | 1.7394 | | 0.3114 | 11.17 | 268 | 1.7420 | | 0.4265 | 11.21 | 269 | 1.7457 | | 0.3329 | 11.25 | 270 | 1.7515 | | 0.3281 | 11.29 | 271 | 1.7568 | | 0.4069 | 11.33 | 272 | 1.7607 | | 0.3584 | 11.38 | 273 | 1.7639 | | 0.3881 | 11.42 | 274 | 1.7664 | | 0.4184 | 11.46 | 275 | 1.7682 | | 0.3187 | 11.5 | 276 | 1.7692 | | 0.389 | 11.54 | 277 | 1.7681 | | 0.4005 | 11.58 | 278 | 1.7683 | | 0.3346 | 11.62 | 279 | 1.7709 | | 0.4174 | 11.67 | 280 | 1.7725 | | 0.342 | 11.71 | 281 | 1.7761 | | 0.3049 | 11.75 | 282 | 1.7794 | | 0.2549 | 11.79 | 283 | 1.7822 | | 0.3893 | 11.83 | 284 | 1.7849 | | 0.3414 | 11.88 | 285 | 1.7884 | | 0.2519 | 11.92 | 286 | 1.7923 | | 0.4489 | 11.96 | 287 | 1.7948 | | 0.2376 | 12.0 | 288 | 1.7967 | | 0.389 | 12.04 | 289 | 1.7964 | | 0.314 | 12.08 | 290 | 1.7963 | | 0.2555 | 12.12 | 291 | 1.7963 | | 0.3484 | 12.17 | 292 | 1.7958 | | 0.345 | 12.21 | 293 | 1.7954 | | 0.3589 | 12.25 | 294 | 1.7945 | | 0.2968 | 12.29 | 295 | 1.7940 | | 0.272 | 12.33 | 296 | 1.7947 | | 0.3022 | 12.38 | 297 | 1.7989 | | 0.4437 | 12.42 | 298 | 1.8041 | | 0.2962 | 12.46 | 299 | 1.8072 | | 0.3557 | 12.5 | 300 | 1.8098 | | 0.2633 | 12.54 | 301 | 1.8131 | | 0.3529 | 12.58 | 302 | 1.8152 | | 0.3046 | 12.62 | 303 | 1.8172 | | 0.3453 | 12.67 | 304 | 1.8187 | | 0.367 | 12.71 | 305 | 1.8210 | | 0.3454 | 12.75 | 306 | 1.8236 | | 0.3081 | 12.79 | 307 | 1.8248 | | 0.4073 | 12.83 | 308 | 1.8256 | | 0.2942 | 12.88 | 309 | 1.8258 | | 0.3029 | 12.92 | 310 | 1.8262 | | 0.3874 | 12.96 | 311 | 1.8276 | | 0.3773 | 13.0 | 312 | 1.8287 | | 0.3005 | 13.04 | 313 | 1.8303 | | 0.2496 | 13.08 | 314 | 1.8318 | | 0.3743 | 13.12 | 315 | 1.8323 | | 0.2532 | 13.17 | 316 | 1.8333 | | 0.3451 | 13.21 | 317 | 1.8346 | | 0.3743 | 13.25 | 318 | 1.8349 | | 0.3334 | 13.29 | 319 | 1.8350 | | 0.2377 | 13.33 | 320 | 1.8362 | | 0.2977 | 13.38 | 321 | 1.8379 | | 0.2956 | 13.42 | 322 | 1.8396 | | 0.3779 | 13.46 | 323 | 1.8406 | | 0.3397 | 13.5 | 324 | 1.8411 | | 0.3656 | 13.54 | 325 | 1.8414 | | 0.35 | 13.58 | 326 | 1.8407 | | 0.3831 | 13.62 | 327 | 1.8404 | | 0.3935 | 13.67 | 328 | 1.8391 | | 0.3442 | 13.71 | 329 | 1.8381 | | 0.33 | 13.75 | 330 | 1.8376 | | 0.2586 | 13.79 | 331 | 1.8375 | | 0.349 | 13.83 | 332 | 1.8378 | | 0.3148 | 13.88 | 333 | 1.8381 | | 0.2371 | 13.92 | 334 | 1.8386 | | 0.3977 | 13.96 | 335 | 1.8388 | | 0.2817 | 14.0 | 336 | 1.8391 | | 0.258 | 14.04 | 337 | 1.8396 | | 0.3157 | 14.08 | 338 | 1.8405 | | 0.2292 | 14.12 | 339 | 1.8413 | | 0.3611 | 14.17 | 340 | 1.8421 | | 0.3431 | 14.21 | 341 | 1.8432 | | 0.2391 | 14.25 | 342 | 1.8445 | | 0.2542 | 14.29 | 343 | 1.8459 | | 0.3984 | 14.33 | 344 | 1.8469 | | 0.2329 | 14.38 | 345 | 1.8480 | | 0.3375 | 14.42 | 346 | 1.8486 | | 0.2692 | 14.46 | 347 | 1.8491 | | 0.4525 | 14.5 | 348 | 1.8495 | | 0.3417 | 14.54 | 349 | 1.8499 | | 0.3488 | 14.58 | 350 | 1.8502 | | 0.3834 | 14.62 | 351 | 1.8504 | | 0.269 | 14.67 | 352 | 1.8507 | | 0.2713 | 14.71 | 353 | 1.8510 | | 0.2776 | 14.75 | 354 | 1.8514 | | 0.3643 | 14.79 | 355 | 1.8518 | | 0.3205 | 14.83 | 356 | 1.8521 | | 0.305 | 14.88 | 357 | 1.8525 | | 0.2691 | 14.92 | 358 | 1.8527 | | 0.3422 | 14.96 | 359 | 1.8529 | | 0.2489 | 15.0 | 360 | 1.8530 | ### Framework versions - Transformers 4.38.1 - Pytorch 2.1.0+cu121 - Datasets 2.17.0 - Tokenizers 0.15.2