File size: 4,207 Bytes
40ab363 dc31634 40ab363 62955fd 99cf8e8 dc31634 40ab363 99cf8e8 dc31634 99cf8e8 dc31634 99cf8e8 dc31634 99cf8e8 dc31634 99cf8e8 dc31634 b09da96 dc31634 b09da96 dc31634 b09da96 dc31634 b09da96 dc31634 40ab363 575f4e5 40ab363 99cf8e8 b09da96 99cf8e8 9ce5f5b 40ab363 9ce5f5b 40ab363 9ce5f5b 40ab363 9ce5f5b 40ab363 9ce5f5b 40ab363 9ce5f5b 40ab363 575f4e5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
---
language:
- en
license: apache-2.0
tags:
- generated_from_trainer
datasets:
- pszemraj/govreport-summarization-8192
metrics:
- rouge
pipeline_tag: summarization
base_model: allenai/led-base-16384
model-index:
- name: led-base-16384-finetuned-govreport
results:
- task:
type: summarization
name: Summarization
dataset:
name: pszemraj/govreport-summarization-8192
type: pszemraj/govreport-summarization-8192
config: split
split: validation
args: split
metrics:
- type: rouge
value: 50.3574
name: ROUGE-1
- type: rouge
value: 20.0448
name: ROUGE-2
- type: rouge
value: 22.2156
name: ROUGE-L
- type: rouge
value: 22.2156
name: ROUGE-LSUM
- task:
type: summarization
name: Summarization
dataset:
name: pszemraj/govreport-summarization-8192
type: pszemraj/govreport-summarization-8192
config: split
split: test
args: split
metrics:
- type: rouge
value: 52.6378
name: ROUGE-1
- type: rouge
value: 22.213
name: ROUGE-2
- type: rouge
value: 23.5898
name: ROUGE-L
- type: rouge
value: 23.5898
name: ROUGE-LSUM
---
# led-base-16384-finetuned-govreport
This model is a fine-tuned version of [allenai/led-base-16384](https://huggingface.co/allenai/led-base-16384) on the [pszemraj/govreport-summarization-8192](https://huggingface.co/datasets/pszemraj/govreport-summarization-8192) dataset.
It achieves the following results on the evaluation set:
- Loss: 1.2887
The rouge metrics calculations were processed later down the line (final notebook can be found [HERE](https://www.kaggle.com/code/marcoloureno/led-base-16384-finetuned-govreport-metrics/notebook)).
It achieved the following results on the validation set:
- Rouge1: 50.3574
- Rouge2: 20.0448
- Rougel: 22.2156
- Rougelsum: 22.2156
It achieved the following results on the test set:
- Rouge1: 52.6378
- Rouge2: 22.2130
- Rougel: 23.5898
- Rougelsum: 23.5898
## Model description
As described in [Longformer: The Long-Document Transformer](https://arxiv.org/pdf/2004.05150.pdf) by Iz Beltagy, Matthew E. Peters, Arman Cohan, [Allenai's Longformer Encoder-Decoder (LED)](https://github.com/allenai/longformer#longformer) was initialized from [*bart-base*](https://huggingface.co/facebook/bart-base) since both models share the exact same architecture. To be able to process 16K tokens, *bart-base*'s position embedding matrix was simply copied 16 times.
This model is especially interesting for long-range summarization and question answering.
## Intended uses & limitations
[pszemraj/govreport-summarization-8192](https://huggingface.co/datasets/pszemraj/govreport-summarization-8192) is a pre-processed version of the dataset [ccdv/govreport-summarization](https://huggingface.co/datasets/ccdv/govreport-summarization), which is a dataset for summarization of long documents adapted from this [repository](https://github.com/luyang-huang96/LongDocSum) and this [paper](https://arxiv.org/pdf/2104.02112.pdf).
The Allenai's LED model was fine-tuned to this dataset, allowing the summarization of documents up to 16384 tokens.
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 1.1492 | 0.24 | 250 | 1.4233 |
| 1.0077 | 0.49 | 500 | 1.3813 |
| 1.0069 | 0.73 | 750 | 1.3499 |
| 0.9639 | 0.98 | 1000 | 1.3216 |
| 0.7996 | 1.22 | 1250 | 1.3172 |
| 0.9395 | 1.46 | 1500 | 1.3003 |
| 0.913 | 1.71 | 1750 | 1.2919 |
| 0.8843 | 1.95 | 2000 | 1.2887 |
### Framework versions
- Transformers 4.30.2
- Pytorch 2.0.0
- Datasets 2.1.0
- Tokenizers 0.13.3 |