Well yes but there is no better way to measure without resorting to pure hearsay. How would you make an accurate assessment of something so inherently vague?
Alter the benchmark space that we care about, for example focus only on ARC-AGI-2 and then suddenly the gains are no longer diminishing but are accelerating.