File size: 4,207 Bytes
40ab363
dc31634
 
40ab363
 
 
62955fd
 
99cf8e8
 
dc31634
 
40ab363
 
99cf8e8
 
 
 
 
 
 
 
 
 
 
dc31634
99cf8e8
dc31634
 
99cf8e8
dc31634
 
99cf8e8
dc31634
 
99cf8e8
dc31634
b09da96
 
 
 
 
 
 
 
 
 
dc31634
b09da96
dc31634
 
 
 
 
b09da96
dc31634
 
b09da96
dc31634
40ab363
 
 
 
575f4e5
40ab363
 
 
99cf8e8
 
 
 
 
 
 
 
 
 
 
 
b09da96
99cf8e8
9ce5f5b
40ab363
 
9ce5f5b
40ab363
9ce5f5b
40ab363
9ce5f5b
40ab363
9ce5f5b
40ab363
9ce5f5b
40ab363
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
575f4e5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
language:
- en
license: apache-2.0
tags:
- generated_from_trainer
datasets:
- pszemraj/govreport-summarization-8192
metrics:
- rouge
pipeline_tag: summarization
base_model: allenai/led-base-16384
model-index:
- name: led-base-16384-finetuned-govreport
  results:
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: pszemraj/govreport-summarization-8192
      type: pszemraj/govreport-summarization-8192
      config: split
      split: validation
      args: split
    metrics:
    - type: rouge
      value: 50.3574
      name: ROUGE-1
    - type: rouge
      value: 20.0448
      name: ROUGE-2
    - type: rouge
      value: 22.2156
      name: ROUGE-L
    - type: rouge
      value: 22.2156
      name: ROUGE-LSUM
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: pszemraj/govreport-summarization-8192
      type: pszemraj/govreport-summarization-8192
      config: split
      split: test
      args: split
    metrics:
    - type: rouge
      value: 52.6378
      name: ROUGE-1
    - type: rouge
      value: 22.213
      name: ROUGE-2
    - type: rouge
      value: 23.5898
      name: ROUGE-L
    - type: rouge
      value: 23.5898
      name: ROUGE-LSUM
---

# led-base-16384-finetuned-govreport

This model is a fine-tuned version of [allenai/led-base-16384](https://huggingface.co/allenai/led-base-16384) on the [pszemraj/govreport-summarization-8192](https://huggingface.co/datasets/pszemraj/govreport-summarization-8192) dataset.
It achieves the following results on the evaluation set:
- Loss: 1.2887

The rouge metrics calculations were processed later down the line (final notebook can be found [HERE](https://www.kaggle.com/code/marcoloureno/led-base-16384-finetuned-govreport-metrics/notebook)).

It achieved the following results on the validation set:
- Rouge1: 50.3574
- Rouge2: 20.0448
- Rougel: 22.2156
- Rougelsum: 22.2156

It achieved the following results on the test set:
- Rouge1: 52.6378
- Rouge2: 22.2130
- Rougel: 23.5898
- Rougelsum: 23.5898


## Model description

As described in [Longformer: The Long-Document Transformer](https://arxiv.org/pdf/2004.05150.pdf) by Iz Beltagy, Matthew E. Peters, Arman Cohan, [Allenai's Longformer Encoder-Decoder (LED)](https://github.com/allenai/longformer#longformer) was initialized from [*bart-base*](https://huggingface.co/facebook/bart-base) since both models share the exact same architecture. To be able to process 16K tokens, *bart-base*'s position embedding matrix was simply copied 16 times.

This model is especially interesting for long-range summarization and question answering.

## Intended uses & limitations

[pszemraj/govreport-summarization-8192](https://huggingface.co/datasets/pszemraj/govreport-summarization-8192) is a pre-processed version of the dataset [ccdv/govreport-summarization](https://huggingface.co/datasets/ccdv/govreport-summarization), which is a dataset for summarization of long documents adapted from this [repository](https://github.com/luyang-huang96/LongDocSum) and this [paper](https://arxiv.org/pdf/2104.02112.pdf).

The Allenai's LED model was fine-tuned to this dataset, allowing the summarization of documents up to 16384 tokens.

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 1.1492        | 0.24  | 250  | 1.4233          |
| 1.0077        | 0.49  | 500  | 1.3813          |
| 1.0069        | 0.73  | 750  | 1.3499          |
| 0.9639        | 0.98  | 1000 | 1.3216          |
| 0.7996        | 1.22  | 1250 | 1.3172          |
| 0.9395        | 1.46  | 1500 | 1.3003          |
| 0.913         | 1.71  | 1750 | 1.2919          |
| 0.8843        | 1.95  | 2000 | 1.2887          |


### Framework versions

- Transformers 4.30.2
- Pytorch 2.0.0
- Datasets 2.1.0
- Tokenizers 0.13.3