Is RoPE scaling correct?

#2
by Noeda - opened

Rope theta is 100k here: https://huggingface.co/keyfan/grok-1-hf/blob/main/config.json#L30 (unless I missed it being overridden anywhere in code).

It's 10k here: https://github.com/xai-org/grok-1/blob/main/model.py#L801

Owner

You're right, I forget to correct that. Thank you for spotting this out.

Thanks :) Also thanks for the HF version. It's much easier to follow than the original Jax implementation.

Noeda changed discussion status to closed

Sign up or log in to comment