🚩 Report

#5
by gileneo - opened

Reflection 70b benchmarks are not real

The whole drama is described here:
https://x.com/shinboson/status/1832933753837982024

This is literally a model posted for you to run.

You're making an assumption from a broken openrouter implementation that no one can reproduce

This is literally a model posted for you to run.

You're making an assumption from a broken openrouter implementation that no one can reproduce

It was not OpenRouter's implementation, they just forwarded requests to Matt's privately hosted API. (which was just a proxy for sonnet 3.5)

@nisten

The evidence based on tokenisation, <META> tag, getting it to output its system message, the questions that revealed it was really Claude are clear proof it wasn't the correct end-point.

If it was an honest mistake and openrouter was accidentally routing the model to the wrong end-point then it wouldn't be filtering and replacing the word "Claude".

@nisten

The evidence based on tokenisation, <META> tag, getting it to output its system message, the questions that revealed it was really Claude are clear proof it wasn't the correct end-point.

If it was an honest mistake and openrouter was accidentally routing the model to the wrong end-point then it wouldn't be filtering and replacing the word "Claude".

The model also changed to GPT-4o ( I assume, they changed it from that pretty quickly )
image.png

Sign up or log in to comment