Is RoPE scaling correct?
#2
by
Noeda
- opened
Rope theta is 100k here: https://huggingface.co/keyfan/grok-1-hf/blob/main/config.json#L30 (unless I missed it being overridden anywhere in code).
It's 10k here: https://github.com/xai-org/grok-1/blob/main/model.py#L801
You're right, I forget to correct that. Thank you for spotting this out.
Thanks :) Also thanks for the HF version. It's much easier to follow than the original Jax implementation.
Noeda
changed discussion status to
closed