"Humanity’s Last Exam" may be a rigorous test of fact-check and pattern recognition, but does it truly measure intelligence, or just a model's ability to absorb and regurgitate structured knowledge?
I mean it's one thing doing maths, and finding historical facts and figures(Google made that very easy), but a whole other thing to ask meaningful, novel questions.
I think true AGI isn’t about scoring an A on an exam, but about the moment an AI starts asking questions humans never thought to ask.
Claude already does this sometimes. It follows up with a question which is to gather more info, or to sometimes even lead me towards a solution. May it be synthetic, its still very useful
The idea is whether the AI can come up with ideas independently, is curious to develop questions, and find answers. In the case of Claude or any AI model, the first input is always from a human. The prompt given is more valuable because, without it, AI would have all the data and computational strength in the world, but wouldn't know what to do with it.
I mean it's one thing doing maths, and finding historical facts and figures(Google made that very easy), but a whole other thing to ask meaningful, novel questions.
I think true AGI isn’t about scoring an A on an exam, but about the moment an AI starts asking questions humans never thought to ask.