Hacker Newsnew | past | comments | ask | show | jobs | submit | more isaacfrond's commentslogin

That’s assuming the administration will allow you to administer the exams onsite which is increasingly not the case. Online students bring in more money.


> The claim of inevitability is crucial to technology hype cycles, from the railroad to television to AI.

Well. You know. We still have plenty of railroad, and television has had a pretty good run too. So if that are the models to compare AI to, then I have bad news for how 'hype cycle' AI is going to be.


The article itself lists as successful, even breakthrough, applications of AI: protein folding, weather forecasting, and drug discovery.


That is some cool jazz


No kidding. First AIs were passing the bar exam—now they’re writing it.

I don't really see the scandal though. I pretty much _only_ use AI for tasks that I could do myself.


Some of the Gemini stuff is almost at airport level. I'm surprised. Everything is going so fast.

The odd thing, is that with technical stuff, I'm continually rewriting the LLM's to be clearer and less verbose. While the fiction is almost the opposite--not literary enough.


I wonder how well humans would do in this chart.


Author here - I'm planning to create game versions of this benchmark, as well as my other multi-agent benchmarks (https://github.com/lechmazur/step_game, https://github.com/lechmazur/pgg_bench/, and a few others I'm developing). But I'm not sure if a leaderboard alone would be enough for comparing LLMs to top humans, since it would require playing so many games that it would be tedious. So I think it would be just for fun.


I was inspired by your project to start making similar multi-agent reality simulations. I’m starting with the reality game “The Traitors” because it has interesting dynamics.

https://github.com/michaelgiba/survivor (elimination game with a shoutout to your original)

https://github.com/michaelgiba/plomp (a small library I added for debugging the rollouts)


Very cool!


If you watch the top tier social deduction players on YouTube (things like Blood on the Clocktower etc), they’d figure out weaknesses in the LLM and exploit it immediately.


Testing against people like that would be the way to do it. Otherwise it’s like testing a chess engine against casual players or worse.


I'm interested in seeing how the LLMs react to some specific defined strategies. E.g. an "honest" bot that says "I'm voting for player [random number]." and does it every round (not sure how to handle the jury step). Do they decide to keep them around for longer, or eliminate them for being impossible to reason with if they pick you?


Yes, predefined strategies are very interesting to examine. I have two simple ones in another multi-agent benchmark, https://github.com/lechmazur/step_game (SilentGreedyPlayer and SilentRandomPlayer), and it's fascinating to see LLMs detect and respond to them. The only issue with including them here is that the cost of running a large set of games isn't trivial.

Another multi-agent benchmark I'm currently developing, which involves buying and selling, will also feature many predefined strategies.


Well, by definition we are simulation LLMs just fine, but per the article we are utterly failing on Elagans, so it seems the smart money is on the latter.


Doesn’t mean it’s more complex though.

It just means the complexity is harder to capture and copy.

LLMs are built via algorithm. Given enough data and a large enough neural network the complexity of an LLM is boundless. I guess my question is are existing LLMs more complex?


LLM research has a much bigger budget than C. Elegans research.


That says more about the industrializaton of scientific research than anything.

LLMs is the new hype product of tech companies. Wait a couple years and the interest will die out.

Maybe we’ll live the day when true neuroscience (none of that Andrew Huberman stuff) will be trending.


More complex and more advanced are not the same thing. Evolution produces a lot of twisty little passages that are only that way because it happened to work.



Reminds me of a quote by famous number theoretic mathematician Hendrik Lenstra:

For every problem you can't solve, there's a simpler problem that you also can't solve.


Is this quote real? I'm familiar with George Pólya's, "If you cannot solve the proposed problem, try to solve first a simpler related problem" but I cannot find any source for the Lenstra quote.


I also found it connected to Polya [1]

https://www.pleacher.com/mp/mquotes/mobquote.html



I’ve heard him say it myself in a lecture on the AKS primality test. So, ehh, the source is oral tradition I guess.


That doesn’t induce nicely. Unless it was an insult.


It’s not induction. It’s just the contrapositive of “if you can solve the simpler problem then you can solve the harder problem”


Monotonic sequences can be bounded!


If you could solve the simpler problems, you’d be able to solve the larger problem. But you can’t, because you can’t even solve a simple problem.


It reads like the famous Churchill quote about "if you gave me poison I would drink it"


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: