LED Paraphrase Model
This repository contains a LED-based model fine-tuned for paraphrasing tasks using the Quora dataset.
Model Overview
The LED (Longformer Encoder-Decoder) model is a variant of the Transformer model designed for tasks requiring longer context. This particular model is fine-tuned to generate paraphrases of given input sentences, making it useful for tasks like text simplification, query rewriting, and more.
Model Details
- Architecture: LEDForConditionalGeneration
- Dataset: Quora dataset (subset of 20,000 samples)
- Training Configuration:
- Epochs: 1
- Batch size: 2
- Learning rate: 5e-5
- Max input length: 1024 tokens
- Max output length: 256 tokens
Repository Contents
pytorch_model.bin
: Model weightsconfig.json
: Model configurationtokenizer_config.json
: Tokenizer configuration- 'generation_config.json':
- 'merges.txt':
- 'special_tokens_map.json':
vocab.json
: Tokenizer vocabularyfine-tune-led.ipynb
: IPython Notebook for training the model
Setup
Install Dependencies
To use the model, ensure you have the following dependencies installed:
transformers
datasets
torch
Usage
To use this model, load it via the transformers
library. The model and tokenizer can be initialized and used to generate paraphrases of input text.
Training Process
Dataset
The model is trained on the Quora dataset, which consists of pairs of paraphrased questions. A subset of 20,000 samples was used for training, with 80% of the data allocated for training and 20% for evaluation.
Preprocessing
Each question pair is tokenized, and the inputs are prepared with appropriate attention masks and labels. The input sequence length is truncated to 1024 tokens, and the output sequence length is truncated to 256 tokens.
Training
The model is fine-tuned using the Seq2SeqTrainer
from the transformers
library with specific training arguments. Gradient accumulation steps and evaluation strategies are employed to optimize the training process.
Evaluation
The model's performance is evaluated using ROUGE and BLEU metrics:
- ROUGE: Measures the overlap of n-grams between the generated and reference texts.
- BLEU: Measures the precision of n-grams in the generated text compared to the reference text.
Evaluation Results
The evaluation results show the model's performance in terms of ROUGE and BLEU scores, which indicate the quality and accuracy of the generated paraphrases.
Example Usage
To generate a paraphrase using the trained model:
- Load the model and tokenizer.
- Prepare the input text.
- Generate the paraphrase and decode it to readable text.
References
- Hugging Face Transformers: The library used for model implementation and training.
- Quora Dataset: The dataset used for training the paraphrase model.