Keyword Extraction Model
This model is a fine-tuned version of the Flan-T5 small model, specifically adapted for extracting keywords from paragraphs. It uses the power of the T5 architecture to identify and output key phrases that capture the essence of the input text.
Model Description
The model takes a paragraph as input and generates a list of keywords or key phrases that summarize the main topics and themes of the text. It's particularly useful for:
- Summarizing long texts
- Generating tags for articles or blog posts
- Identifying main themes in documents
Intended Uses & Limitations
Intended Uses:
- Quick summarization of long paragraphs
- Generating metadata for content management systems
- Assisting in SEO keyword identification
Limitations:
- The model may sometimes generate irrelevant keywords
- Performance may vary depending on the length and complexity of the input text
- For best results, use long clean texts
- Length limit is 512 tokens due to Flan-T5 architecture
- The model is trained on English text and may not perform well on other languages
Training and Evaluation
The model was fine-tuned on a dataset of English Wikipedia paragraphs and their corresponding keywords which includes a diverse range of topics to ensure broad applicability.
How to Use
Here's a simple example of how to use the model:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_name = "agentlans/flan-t5-small-keywords"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
input_text = "Your paragraph here..."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Process the output to get a list of keywords (split and remove duplicates)
keywords = list(set(decoded_output.split('||')))
print(keywords)
Example input paragraph:
In the heart of the bustling city, a hidden gem awaits discovery: a quaint little bookstore that seems to have escaped the relentless march of time. As you step inside, the scent of aged paper and rich coffee envelops you, creating an inviting atmosphere that beckons you to explore its shelves. Each corner is adorned with carefully curated collections, from classic literature to contemporary bestsellers, inviting readers of all tastes to lose themselves in the pages of a good book. The soft glow of warm lighting casts a cozy ambiance, while the gentle hum of conversation among fellow book lovers adds to the charm. This bookstore is not just a place to buy books; it's a sanctuary for those seeking solace, inspiration, and a sense of community in the fast-paced world outside.
Example output keywords:
['old paper coffee scent', 'cosy hum of conversation', 'quaint bookstore', 'community in the fast-paced world', 'solace inspiration', 'curated collections']
Limitations and Bias
This model has been trained on English Wikipedia paragraphs, which may introduce biases. Users should be aware that the keywords generated might reflect these biases and should use the output judiciously.
Training Details
- Training Data: dataset of Wikipedia paragraphs and keywords
- Training Procedure: Fine-tuning of google/flan-t5-small
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 10.0
Framework versions
- Transformers 4.45.1
- Pytorch 2.4.1+cu121
- Datasets 3.0.1
- Tokenizers 0.20.0
Ethical Considerations
When using this model, consider the potential impact of automated keyword extraction on content creation and SEO practices. Ensure that the use of this model complies with relevant guidelines and does not contribute to the creation of misleading or spammy content.
- Downloads last month
- 322
Model tree for agentlans/flan-t5-small-keywords
Base model
google/flan-t5-small