Edit model card
YAML Metadata Error: "base_model" with value "/exports/eddie/scratch/s1970716/models/summarization/longt5_xl_summ_screen/checkpoint-140" is not valid. Use a model id from https://hf.co/models.

longt5_xl_summ_screen_20

This model is a fine-tuned version of /exports/eddie/scratch/s1970716/models/summarization/longt5_xl_summ_screen/checkpoint-140 on the tau/scrolls summ_screen_fd dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1917
  • Rouge1: 28.1708
  • Rouge2: 6.6895
  • Rougel: 18.1637
  • Rougelsum: 24.3987
  • Gen Len: 96.2041

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 8
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.4063 0.97 14 3.7385 27.9171 6.7215 17.9315 24.363 71.9083
0.3125 1.95 28 3.1917 28.1708 6.6895 18.1637 24.3987 96.2041
0.2177 2.99 43 3.9998 29.3167 5.9 17.3608 25.6945 198.0473
0.1753 3.97 57 4.2287 29.0605 6.2534 17.5744 25.6415 158.6509
0.2747 4.94 71 4.1027 31.2245 6.5663 18.1588 26.8996 118.4438
0.1045 5.98 86 5.0581 30.6056 6.8892 18.4933 26.4027 92.9882
0.0875 6.96 100 4.5941 32.5234 7.3736 18.8958 28.4738 160.8964
0.1572 8.0 115 4.9386 31.4658 7.2592 18.4796 27.6047 121.0178
0.0867 8.97 129 4.5565 32.0531 7.0692 18.5551 27.3373 160.4793
0.0748 9.74 140 5.0866 32.2717 7.7004 18.9107 28.3874 124.1893

Framework versions

  • Transformers 4.34.0.dev0
  • Pytorch 2.0.1+cu117
  • Datasets 2.14.5
  • Tokenizers 0.13.3
Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train learn3r/longt5_xl_summ_screen_20

Evaluation results