>> These benchmark numbers cannot be real for a 7b model

> LLM benchmarks are mostly bullshit right now. Wait a few years until the hype cycle returns to sanity.

This could mean a lot of things. Can you be a bit more specific? It's one thing to say benchmarks are gamed. Another to say models end up being trained on the benchmark indirectly. Another to say they the particular experimental setup during the benchmark is unclear. Another to say mapping a benchmark to a real use case is hard. Are you saying some/all of these claims?

Have you plotted MiMo versus others? Another comment suggests smaller models are performing better than expected. Any comment on that?