Model Card for gemma-2-2b-it-research-in-a-flash

Fine-tune the Gemma2 2b model for summarizing scientific papers.
Filter the dataset for computer science papers to optimize training time.
Deploy the model on Hugging Face for easy accessibility.

Model Details

Model Description

his model is a fine-tuned version of google/gemma-2-2b-it on the cnn_dailymail dataset, designed for the task of summarization. It can summarize paragraphs of text, especially from research papers or news articles, into concise summaries. The model has been fine-tuned using the LoRA (Low-Rank Adaptation) method for parameter-efficient training.

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: Changjip Moon
Model type: Summarization
Language(s) (NLP): Korean, English
License: Apache 2.0
Finetuned from model [optional]: google/gemma-2-2b-it

Model Sources [optional]

Demo: https://colab.research.google.com/drive/1xiyWCnTzXmFFgD7CBL-jq8m2Mv29fg-M?usp=sharing

Uses

Direct Use

This model can be used to generate concise summaries of long texts. It is designed for summarizing academic papers, research materials, or news articles.

Downstream Use

This model can be fine-tuned further for other languages or summarization-specific tasks like topic-based summarization.

Out-of-Scope Use

This model is not designed for tasks outside of text summarization, such as text classification or question answering. It also may not perform well on non-news or non-research data.

Bias, Risks, and Limitations

This model may have biases inherited from the cnn_dailymail dataset, which is mainly based on news articles in English. It may not perform well on non-news content or in cases where high precision is required for legal, medical, or sensitive content.

Training Details

Training Data

The model was fine-tuned on the cnn_dailymail dataset, which contains articles and summaries from CNN and Daily Mail. The dataset is commonly used for text summarization tasks.

Training Procedure

The model was trained using the following hyperparameters:

Learning rate: 2e-4
Batch size: 8 (with gradient accumulation steps of 4)
Epochs: 1
Max sequence length: 256
Optimization method: AdamW with 8-bit quantization

Preprocessing

Standard tokenization and truncation were applied. The maximum sequence length was set to 256 to balance memory usage and training speed.

Training Hyperparameters

Training regime: go to google colab pages if you want to know

Speeds, Sizes, Times

[2500/2500 22:33, Epoch 1/1] : Cause of timeout issue, I need to make a subset of data..

dwhouse
/

gemma-2-2b-it-research-in-a-flash