One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
Abstract
Foundation models (FMs) are pre-trained on large-scale datasets and then fine-tuned on a downstream task for a specific application. The most successful and most commonly used fine-tuning method is to update the pre-trained weights via a low-rank adaptation (LoRA). LoRA introduces new weight matrices that are usually initialized at random with a uniform rank distribution across model weights. Recent works focus on weight-driven initialization or learning of adaptive ranks during training. Both approaches have only been investigated in isolation, resulting in slow convergence or a uniform rank distribution, in turn leading to sub-optimal performance. We propose to enhance LoRA by initializing the new weights in a data-driven manner by computing singular value decomposition on minibatches of activation vectors. Then, we initialize the LoRA matrices with the obtained right-singular vectors and re-distribute ranks among all weight matrices to explain the maximal amount of variance and continue the standard LoRA fine-tuning procedure. This results in our new method Explained Variance Adaptation (EVA). We apply EVA to a variety of fine-tuning tasks ranging from language generation and understanding to image classification and reinforcement learning. EVA exhibits faster convergence than competitors and attains the highest average score across a multitude of tasks per domain.
Community
Code: https://github.com/ml-jku/EVA
Minimal working example for EVA in the PEFT library: https://github.com/sirluk/peft/blob/main/examples/eva_finetuning/eva_finetuning.py
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- HUT: A More Computation Efficient Fine-Tuning Method With Hadamard Updated Transformation (2024)
- SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values (2024)
- PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning (2024)
- LoRA2: Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models (2024)
- NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper