Model Card for gemma-2-2b-it-research-in-a-flash
- Fine-tune the Gemma2 2b model for summarizing scientific papers.
- Filter the dataset for computer science papers to optimize training time.
- Deploy the model on Hugging Face for easy accessibility.
Model Details
Model Description
his model is a fine-tuned version of google/gemma-2-2b-it
on the cnn_dailymail
dataset, designed for the task of summarization.
It can summarize paragraphs of text, especially from research papers or news articles, into concise summaries.
The model has been fine-tuned using the LoRA (Low-Rank Adaptation) method for parameter-efficient training.
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: Changjip Moon
- Model type: Summarization
- Language(s) (NLP): Korean, English
- License: Apache 2.0
- Finetuned from model [optional]: google/gemma-2-2b-it
Model Sources [optional]
Uses
Direct Use
This model can be used to generate concise summaries of long texts. It is designed for summarizing academic papers, research materials, or news articles.
Downstream Use
This model can be fine-tuned further for other languages or summarization-specific tasks like topic-based summarization.
Out-of-Scope Use
This model is not designed for tasks outside of text summarization, such as text classification or question answering. It also may not perform well on non-news or non-research data.
Bias, Risks, and Limitations
This model may have biases inherited from the cnn_dailymail
dataset, which is mainly based on news articles in English. It may not perform well on non-news content or in cases where high precision is required for legal, medical, or sensitive content.
Training Details
Training Data
The model was fine-tuned on the cnn_dailymail
dataset, which contains articles and summaries from CNN and Daily Mail. The dataset is commonly used for text summarization tasks.
Training Procedure
The model was trained using the following hyperparameters:
- Learning rate: 2e-4
- Batch size: 8 (with gradient accumulation steps of 4)
- Epochs: 1
- Max sequence length: 256
- Optimization method: AdamW with 8-bit quantization
Preprocessing
Standard tokenization and truncation were applied. The maximum sequence length was set to 256 to balance memory usage and training speed.
Training Hyperparameters
- Training regime: go to google colab pages if you want to know
Speeds, Sizes, Times
[2500/2500 22:33, Epoch 1/1] : Cause of timeout issue, I need to make a subset of data..
- Downloads last month
- 18