Layer Freezing and Transformer-Based Data Curation for Enhanced Transfer Learning in YOLO Architectures

Abstract

The You Only Look Once (YOLO) architecture has transformed real-time object detection by performing detection, localization, and classification in a single pass. Despite its efficiency, balancing accuracy with computational resources remains a challenge, particularly in resource-constrained environments. This research investigates the impact of layer freezing in YOLO models, a transfer learning technique that enhances model adaptability without extensive retraining. We explore various YOLO configurations, including YOLOv8 and YOLOv10, across four datasets selected for their relevance to real-world applications, particularly in monitoring and inspecting critical infrastructure, including scenarios involving unmanned aerial vehicles (UAVs). Our findings show that freezing selected layers can significantly reduce training time and GPU consumption while maintaining or even surpassing accuracy compared to traditional fine-tuning. In particular, the small YOLOv10 variant with layer freezing achieved a mAP@50 of 0.84 on one of the datasets, representing a 28% reduction in GPU usage and a nearly 3% increase in mAP compared to full fine-tuning. Additionally, while we did not focus solely on improving the mean Average Precision (mAP) metrics, we aimed to maintain performance with less data, effectively capturing the source distribution more efficiently. For three of the four datasets we have worked with, we achieved a 3% reduction in both mAP@50 and mAP@50:95 scores while using 30% less training data by curating the training portion of the datasets using a strategy involving Vision Transformers and a cosine similarity metric.

Installation
Usage
Examples
Contributing
License

Installation

Using pip

Clone the repository:

git clone https://huggingface.co/AndrzejDD/enhanced-transfer-learning
cd enhanced-transfer-learning

Create a virtual environment (optional but recommended):

python -m venv enhanced-tl
source enhanced-tl/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Using conda

Clone the repository:

git clone https://huggingface.co/AndrzejDD/enhanced-transfer-learning
cd enhanced-transfer-learning

Create a conda environment from the provided environment file:
```
conda env create -f environment.yml
```
Activate the conda environment:
```
conda activate enhanced-tl
```

After completing these steps, the required dependencies will be installed, and you can start training your models.

Usage

To display the help message and see all available options, run the following command:

python3 main.py --help

Example Output

When you run the help command, you will see an output like this:

usage: main.py [-h] [--dataset DATASET_NAME] [--epochs EPOCHS] [--batch BATCH] [--imgsz IMGSZ] 
               [--patience PATIENCE] [--cache CACHE] [--pretrained] [--cos_lr] [--profile] [--plots] [--resume]
               [--model MODEL_NAME] [--run RUN_NAME]

options:
  -h, --help            show this help message and exit
  --dataset 		DATASET_NAME
                        Dataset name to be used
  --epochs EPOCHS       Number of epochs for training
  --batch BATCH         Batch size
  --imgsz IMGSZ         Image size for training
  --patience PATIENCE   Early stopping patience
  --cache CACHE         Caching mechanism to use
  --pretrained          Use pretrained weights
  --cos_lr              Use cosine learning rate schedule
  --profile             Enable training profiling
  --plots               Generate training plots
  --resume              Resume training from a checkpoint
  --model		MODEL_NAME
                        Name of the YOLO model to use
  --run 		RUN_NAME
            Name of the run configuration

To run the project, use the following command:

python3 main.py --dataset "Dataset Name" --epochs 1000 --batch 16 --imgsz 640 --patience 30 --model "yolov10s" --run "Finetuning"

Examples

Example 1: Fine-Tuning the YOLOv10 Model

To fine-tune the YOLOv10 small model (yolov10s) with frozen backbone layers, run:

python3 main.py --dataset "Dataset Name" --epochs 1000 --batch 16 --pretrained --plots --model "yolov10s" --run "Finetuning"
span

License

Please note that the license for each specific dataset should be checked from its source. Additionally, ensure to review the licenses for the YOLOv10 and YOLOv8 models as well. The original datasets used in this research are: InsPLAD-det (https://github.com/andreluizbvs/InsPLAD/tree/main), Electric Substation (https://figshare.com/articles/dataset/A_YOLO_Annotated_15-class_Ground_Truth_Dataset_for_Substation_Equipment/24060960), VALID (https://sites.google.com/view/valid-dataset), and Birds Nest (https://zenodo.org/records/4015912#.X1O_0osRVPY).

AndrzejDD
/

enhanced_transfer_learning