UNIST-Eunchan
/

Research-Paper-Summarization-Pegasus-x-ArXiv

text2text-generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

Research-Paper-Summarization-Pegasus-x-ArXiv / README.md

UNIST-Eunchan's picture

Update README.md

510a7fc about 1 year ago

|

3.62 kB

	---
	base_model: google/pegasus-x-base
	tags:
	- generated_from_trainer
	datasets:
	- ccdv/arxiv-summarization
	model-index:
	- name: Paper-Summarization-ArXiv
	results:
	- task:
	name: Summarization
	type: summarization
	dataset:
	name: ccdv/arxiv-summarization
	type: ccdv/arxiv-summarization
	config: section
	split: test
	args: section
	metrics:
	- name: ROUGE-1
	type: rouge
	value: 43.2305
	- name: ROUGE-2
	type: rouge
	value: 16.6571
	- name: ROUGE-L
	type: rouge
	value: 24.4315
	- name: ROUGE-LSum
	type: rouge
	value: 33.9399
	license: bigscience-openrail-m
	language:
	- en
	metrics:
	- rouge
	library_name: transformers
	pipeline_tag: summarization
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Paper-Summarization-ArXiv

	This model is a fine-tuned version of [google/pegasus-x-base](https://huggingface.co/google/pegasus-x-base) on the arxiv-summarization dataset.

	It achieves the following results on the evaluation set:
	- Loss: 2.0127

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 2.6153 \| 1.0 \| 3172 \| 2.1045 \|
	\| 2.202 \| 2.0 \| 6344 \| 2.0511 \|
	\| 2.1547 \| 3.0 \| 9516 \| 2.0282 \|
	\| 2.132 \| 4.0 \| 12688 \| 2.0164 \|
	\| 2.1222 \| 5.0 \| 15860 \| 2.0127 \|



	## Model description

	More information needed

	## Intended uses & limitations

	Paper Summarization

	## Compare to Baseline
	- Pegasus-X-base zero-shot Performance:
	- R-1 \| R-2 \| R-L \| R-LSUM : 6.2269 \| 0.7894 \| 4.6905 \| 5.4591

	- This model


	- R-1 \| R-2 \| R-L \| R-LSUM : 43.2305 \| 16.6571 \| 24.4315 \| 33.9399 at
	```(python)
	model.generate(input_ids =inputs["input_ids"].to(device),
	attention_mask=inputs["attention_mask"].to(device),
	length_penalty=1, num_beams=2, max_length=128*4,min_length=150, no_repeat_ngram_size= 3, top_k=25,top_p=0.95)

	```
	- R-1 \| R-2 \| R-L \| R-LSUM : 40.8486 \| 16.3717 \| 25.2937 \| 33.6923 (refer to PEGASUS-X's [paper](https://arxiv.org/pdf/2208.04347.pdf)) at
	```(python)
	model.generate(input_ids =inputs["input_ids"].to(device),
	attention_mask=inputs["attention_mask"].to(device),
	length_penalty=1, num_beams=1, max_length=128*2,top_p=1)
	```
	- R-1 \| R-2 \| R-L \| R-LSUM : TBD \| TBD \| TBD \| TBD (Diverse Beam-Search Decoding) at
	```(python)
	model.generate(input_ids =inputs["input_ids"].to(device),
	attention_mask=inputs["attention_mask"].to(device),
	num_beam_groups=16,diversity_penalty=1.0,num_beams=16,min_length=100,max_length=128*4)
	```



	## Training and evaluation data

	We use full of dataset 'ccdv/arxiv-summarization'.

	## Training procedure

	We use huggingface-based environment such as datasets, trainer, etc.


	### Training hyperparameters

	The following hyperparameters were used during training:
	```learning_rate: 1e-05,train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 64
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 1586
	- num_epochs: 5```


	### Framework versions

	- Transformers 4.32.1
	- Pytorch 2.0.1
	- Datasets 2.12.0
	- Tokenizers 0.13.2