405B or 410B ?
#8
by
alielfilali01
- opened
The name and advertisement suggest the 405B name but the safetensors tag show the model as 410B ! Given the overall size it can be negligent but still it's a 5B params not counted ! Is there any specific reason?
@Ali-C137 its probably ignoring the embedding params
According to the llama3 tech paper, 405b is supposed to be using 8 key-value heads (the same as 8b and 70b), in that case, the model will be 405B (with embedding). And later they changed to 16 key-value heads (current published model) but do not want to change the model name..... They should mention it in the tech paper though.