0.7071067812

#1
by jukofyork - opened

Just found your repo looking for a copy of Meta-Llama-3-8B without the extra ./original to download and noticed you had asked this:

As I understand, The concept of the idea is to make model think twice but leap same distances like original. but why 0.7071067812?

The scale factor to use, eg: solve x^2 = 1/2 --> x = 1/sqrt(2) ≈ 0.7071067812

The strange number comes from the fact that the k_proj and q_proj matrices get multiplied together, eg:

If these just contained a single number (ie: a scaler) and the inputs were k_input and q_input scalers:

output = (k_input x k_proj) x (q_input x q_proj)

output = 1.0 xk_input x k_proj x q_input x q_proj

and if you tried to halve the k_proj and q_proj values hoping to halve the output:

output = (k_input x 0.5 x k_proj) x (q_input x 0.5 x q_proj)

output = k_input x 0.5 x k_proj x q_input x 0.5 x q_proj

output = 0.5 x 0.5 x k_input x k_proj x q_input x q_proj

output = 0.25 x k_input x k_proj x q_input x q_proj

so as you can see the output will be quartered and not halved!

So the solution is to use sqrt(0.5) like this:

output = (k_input x sqrt(0.5) x k_proj) x (q_input x sqrt(0.5) x q_proj)

output = k_input x sqrt(0.5) x k_proj x q_input x sqrt(0.5) x q_proj

output = sqrt(0.5) x sqrt(0.5) x k_input x k_proj x q_input x q_proj

output = 0.5 x k_input x k_proj x q_input x q_proj

which is now correctly halving the output!

The decimal representation of sqrt(0.5) is 0.7071067812 and this is where the number comes from.

Hope this helps! :)

Can’t believe you made a discussion to my model! I’m a huge fan of your control vectors, merging, MIQU models, etc. Your passion for this work always thrills me! Thank you for the kind explanation, and keep up the great work as always. Regards!

Can’t believe you made a discussion to my model! I’m a huge fan of your control vectors, merging, MIQU models, etc. Your passion for this work always thrills me! Thank you for the kind explanation, and keep up the great work as always. Regards!

Thanks! :)

Sign up or log in to comment