Librarian Bot: Add base_model information to model

dc31634 about 1 year ago

4.21 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- generated_from_trainer
	datasets:
	- pszemraj/govreport-summarization-8192
	metrics:
	- rouge
	pipeline_tag: summarization
	base_model: allenai/led-base-16384
	model-index:
	- name: led-base-16384-finetuned-govreport
	results:
	- task:
	type: summarization
	name: Summarization
	dataset:
	name: pszemraj/govreport-summarization-8192
	type: pszemraj/govreport-summarization-8192
	config: split
	split: validation
	args: split
	metrics:
	- type: rouge
	value: 50.3574
	name: ROUGE-1
	- type: rouge
	value: 20.0448
	name: ROUGE-2
	- type: rouge
	value: 22.2156
	name: ROUGE-L
	- type: rouge
	value: 22.2156
	name: ROUGE-LSUM
	- task:
	type: summarization
	name: Summarization
	dataset:
	name: pszemraj/govreport-summarization-8192
	type: pszemraj/govreport-summarization-8192
	config: split
	split: test
	args: split
	metrics:
	- type: rouge
	value: 52.6378
	name: ROUGE-1
	- type: rouge
	value: 22.213
	name: ROUGE-2
	- type: rouge
	value: 23.5898
	name: ROUGE-L
	- type: rouge
	value: 23.5898
	name: ROUGE-LSUM
	---

	# led-base-16384-finetuned-govreport

	This model is a fine-tuned version of [allenai/led-base-16384](https://huggingface.co/allenai/led-base-16384) on the [pszemraj/govreport-summarization-8192](https://huggingface.co/datasets/pszemraj/govreport-summarization-8192) dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.2887

	The rouge metrics calculations were processed later down the line (final notebook can be found [HERE](https://www.kaggle.com/code/marcoloureno/led-base-16384-finetuned-govreport-metrics/notebook)).

	It achieved the following results on the validation set:
	- Rouge1: 50.3574
	- Rouge2: 20.0448
	- Rougel: 22.2156
	- Rougelsum: 22.2156

	It achieved the following results on the test set:
	- Rouge1: 52.6378
	- Rouge2: 22.2130
	- Rougel: 23.5898
	- Rougelsum: 23.5898


	## Model description

	As described in [Longformer: The Long-Document Transformer](https://arxiv.org/pdf/2004.05150.pdf) by Iz Beltagy, Matthew E. Peters, Arman Cohan, [Allenai's Longformer Encoder-Decoder (LED)](https://github.com/allenai/longformer#longformer) was initialized from [bart-base](https://huggingface.co/facebook/bart-base) since both models share the exact same architecture. To be able to process 16K tokens, bart-base's position embedding matrix was simply copied 16 times.

	This model is especially interesting for long-range summarization and question answering.

	## Intended uses & limitations

	[pszemraj/govreport-summarization-8192](https://huggingface.co/datasets/pszemraj/govreport-summarization-8192) is a pre-processed version of the dataset [ccdv/govreport-summarization](https://huggingface.co/datasets/ccdv/govreport-summarization), which is a dataset for summarization of long documents adapted from this [repository](https://github.com/luyang-huang96/LongDocSum) and this [paper](https://arxiv.org/pdf/2104.02112.pdf).

	The Allenai's LED model was fine-tuned to this dataset, allowing the summarization of documents up to 16384 tokens.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.1492 \| 0.24 \| 250 \| 1.4233 \|
	\| 1.0077 \| 0.49 \| 500 \| 1.3813 \|
	\| 1.0069 \| 0.73 \| 750 \| 1.3499 \|
	\| 0.9639 \| 0.98 \| 1000 \| 1.3216 \|
	\| 0.7996 \| 1.22 \| 1250 \| 1.3172 \|
	\| 0.9395 \| 1.46 \| 1500 \| 1.3003 \|
	\| 0.913 \| 1.71 \| 1750 \| 1.2919 \|
	\| 0.8843 \| 1.95 \| 2000 \| 1.2887 \|


	### Framework versions

	- Transformers 4.30.2
	- Pytorch 2.0.0
	- Datasets 2.1.0
	- Tokenizers 0.13.3