They are both designed, trained, and evaluated by how well they can predict the next token. It's literally what they do. "Reasoning" models just buildup additional context of next token predictions and RL is used to bias output options to ones more appealing to human judges. It's not a meme. It's an accurate description of their fundamental computational nature.