This reminds me of the idea that Human-Chess partnerships would be the ultimate manifestation of Chess genius. I'm not sure whether the idea is still holding on but engines are so far ahead of human play that I doubt a human in the loop can add anything these days given how devastatingly far ahead the engines are and the advent of machine learning techniques.
Chess reminds me more of programming given the set of defined rules in each. However, I'm biased as I work in radiology and program more as a hobby. So far I've seen way more tools to help me code than to accurately detect radiologic findings.
You can compare best image synthesis and image understanding from two decades ago (SIFT / HOG), from a decade ago (CNN, SdA) and now (Transformer). Very rapid progress that went from being able to unreliability recognize a face to getting to outperforming human professionals (see MMMU) is quite remarkable.
AIUI, and I may be wrong, but each of the mentioned technologies was a "breakthrough" technology - not iterative improvement. Along this vein, I was wondering if there was some promising, novel research OP was aware of for image understanding.
The most recent of which you mentioned, Transformers, is used by both LLMs and image synthesis/understanding. The parent posits that while computer vision lags behind NLP, this may not continue. While your comment points out that image synthesis and understanding has improved over time, I'm not sure I follow the argument that it may soon leapfrog or even catch up with LLMs (i.e. text understanding and synthesis.)
My understanding is that the way that chess.com and other online services detect cheating is by comparing the human-made moves to a "perfect" version of what the chess engine would play.
Which gives credence to your theory that people aren't bringing much to the table.