That the answers have been available to them in the environment, and they’re sti...

raincole · 2025-09-11T20:13:52 1757621632

It really isn't. Do you expect SOTA models to answer any answered question on the internet with 100% accuracy? Congrats you just compressed the whole internet (at least a few zettabytes) into a model (a few TB at most?).

OtherShrezzing · 2025-09-11T20:31:07 1757622667

The linked ticket isn’t suggesting the commit is in the training data. It’s demonstrating that models run ‘git log’, find the exact code to fix the issue against which they’ll be scored, and then they implement that code as-is.

The test environment contains the answers to the questions.

imiric · 2025-09-12T09:00:04 1757667604

Well, we're dealing with (near) superintelligence here, according to the companies that created the models. Not only would I expect them to regurgitate the answers they were trained on, which includes practically the entire internet, but I would expect them to answer questions they weren't trained on. Maybe not with 100% accuracy, but certainly much higher than they do now.

It's perfectly reasonable to expect a level of performance concordant with the marketing of these tools. Claiming this is superintelligence, while also excusing its poor performance is dishonest and false advertising.

Tanjreeve · 2025-09-13T04:59:10 1757739550

Why does this matter if these models are a super intelligence with reasoning etc and don't need the answers sucked off the internet?

aurareturn · 2025-09-11T20:11:12 1757621472

Are you going to rail on humans for making this mistake in the first place?

themafia · 2025-09-11T20:14:04 1757621644

No because that's the baseline. It's what you do when you have no other choice. Railing against that would be pointless.

ares623 · 2025-09-11T20:57:35 1757624255

i mean, if a human was claiming they could do that and successfully received billions to attempt to do it, and fail to deliver, i'd be railing against that particular human too