Could you share the VRM used for finetuning?
Thanks!
Around 39GB was used. That's with flash attention 2, a micro batch size of 2, and capping the sequence length at 4096.
· Sign up or log in to comment