Target module question

#1
by nicolollo - opened

Why not targetting "q_proj", "o_proj", "k_proj", "v_proj", "linear", "Conv2d", "lm_head", "fc2" ?

Hi @nicolollo , I had to compromise on the layers I targeted because my system isn’t that powerful and would have taken a long time to train.

oh i see thanks may i ask what batch size you used ? and why the base FT instead of the base

I used a batch size of 8 for this. I tried batch size of 16 as well but it would run out of GPU memory.

As for the model selection, I chose the base FT version since it already produces great captions, so my focus was on making small quality improvements.

NikshepShetty changed discussion status to closed

Sign up or log in to comment