fine-tune-led / README.md
Akki-off's picture
Update README.md
5af22a2 verified

LED Paraphrase Model

This repository contains a LED-based model fine-tuned for paraphrasing tasks using the Quora dataset.

Model Overview

The LED (Longformer Encoder-Decoder) model is a variant of the Transformer model designed for tasks requiring longer context. This particular model is fine-tuned to generate paraphrases of given input sentences, making it useful for tasks like text simplification, query rewriting, and more.

Model Details

  • Architecture: LEDForConditionalGeneration
  • Dataset: Quora dataset (subset of 20,000 samples)
  • Training Configuration:
    • Epochs: 1
    • Batch size: 2
    • Learning rate: 5e-5
    • Max input length: 1024 tokens
    • Max output length: 256 tokens

Repository Contents

  • pytorch_model.bin: Model weights
  • config.json: Model configuration
  • tokenizer_config.json: Tokenizer configuration
  • 'generation_config.json':
  • 'merges.txt':
  • 'special_tokens_map.json':
  • vocab.json: Tokenizer vocabulary
  • fine-tune-led.ipynb: IPython Notebook for training the model

Setup

Install Dependencies

To use the model, ensure you have the following dependencies installed:

  • transformers
  • datasets
  • torch

Usage

To use this model, load it via the transformers library. The model and tokenizer can be initialized and used to generate paraphrases of input text.

Training Process

Dataset

The model is trained on the Quora dataset, which consists of pairs of paraphrased questions. A subset of 20,000 samples was used for training, with 80% of the data allocated for training and 20% for evaluation.

Preprocessing

Each question pair is tokenized, and the inputs are prepared with appropriate attention masks and labels. The input sequence length is truncated to 1024 tokens, and the output sequence length is truncated to 256 tokens.

Training

The model is fine-tuned using the Seq2SeqTrainer from the transformers library with specific training arguments. Gradient accumulation steps and evaluation strategies are employed to optimize the training process.

Evaluation

The model's performance is evaluated using ROUGE and BLEU metrics:

  • ROUGE: Measures the overlap of n-grams between the generated and reference texts.
  • BLEU: Measures the precision of n-grams in the generated text compared to the reference text.

Evaluation Results

The evaluation results show the model's performance in terms of ROUGE and BLEU scores, which indicate the quality and accuracy of the generated paraphrases.

Example Usage

To generate a paraphrase using the trained model:

  1. Load the model and tokenizer.
  2. Prepare the input text.
  3. Generate the paraphrase and decode it to readable text.

References