longt5_xl_sfd_bp_40 / README.md
learn3r's picture
End of training
e2b0062
|
raw
history blame
4.07 kB
metadata
base_model: >-
  /exports/eddie/scratch/s1970716/models/summarization/longt5_xl_sfd_bp_20/checkpoint-280
tags:
  - generated_from_trainer
datasets:
  - learn3r/summ_screen_fd_bp
metrics:
  - rouge
model-index:
  - name: longt5_xl_sfd_bp_40
    results:
      - task:
          name: Summarization
          type: summarization
        dataset:
          name: learn3r/summ_screen_fd_bp
          type: learn3r/summ_screen_fd_bp
        metrics:
          - name: Rouge1
            type: rouge
            value: 40.6965

longt5_xl_sfd_bp_40

This model is a fine-tuned version of /exports/eddie/scratch/s1970716/models/summarization/longt5_xl_sfd_bp_20/checkpoint-280 on the learn3r/summ_screen_fd_bp dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8277
  • Rouge1: 40.6965
  • Rouge2: 17.2793
  • Rougel: 27.8429
  • Rougelsum: 39.0726
  • Gen Len: 294.0890

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 20.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.1033 0.97 14 3.1096 40.3355 16.0557 27.4642 38.6436 279.7507
0.0836 1.95 28 3.0361 38.2411 16.5448 26.6409 36.5841 368.4659
0.0717 2.99 43 2.9389 32.0114 13.7953 22.278 30.726 489.2047
0.0614 3.97 57 3.0221 32.969 13.7053 22.7428 31.6951 477.7240
0.1275 4.94 71 2.8277 40.6965 17.2793 27.8429 39.0726 294.0890
0.0511 5.98 86 3.0433 33.6479 15.0729 23.5443 32.3304 476.8457
0.0666 6.96 100 3.1150 37.743 16.2368 26.2524 36.1313 390.4362
0.0398 8.0 115 3.2225 41.3177 16.6663 28.7806 39.5914 203.4006
0.0396 8.97 129 3.1462 39.9605 16.6732 28.3459 38.226 123.8309
0.0466 9.95 143 3.2545 40.7977 16.9616 27.427 38.8973 298.5579
0.043 10.99 158 3.3188 36.6349 16.1781 25.1327 35.1793 425.1395
0.0538 11.97 172 2.8277 36.7878 15.1186 24.9774 35.275 394.8605
0.028 12.94 186 3.4398 42.9644 18.1812 29.1539 41.0465 188.1780
0.1056 13.98 201 3.3348 41.1626 17.1605 27.6558 39.2548 261.2967
0.0303 14.96 215 3.0238 42.2372 17.7292 28.8099 40.3325 231.6083
0.0234 16.0 230 3.3485 41.714 17.7161 27.9345 39.8519 306.1602
0.0263 16.97 244 3.2419 42.0014 17.2719 28.499 40.2024 210.7122
0.0225 17.95 258 3.3453 41.7766 17.7154 28.4692 39.9749 248.5786
0.0225 18.99 273 3.4441 42.1727 17.598 28.5122 40.4005 248.6380
0.0211 19.48 280 3.3211 42.5239 17.4102 28.6868 40.6537 200.3798

Framework versions

  • Transformers 4.34.1
  • Pytorch 2.1.0+cu121
  • Datasets 2.14.5
  • Tokenizers 0.14.1