Hacker Newsnew | past | comments | ask | show | jobs | submit | Sevii's commentslogin

I don't understand how people feel comfortable writing 'LLMs are done improving, this plateau is it.' when we haven't even gone an entire calendar year without seeing improvements to LLM based AI.

You can expand to cover more of a plateau, that is still an improvement but also still a plateau as you aren't going higher.

So models improve on specific tasks, but they don't really improve generally across the board any longer.


I wonder if the people saying that would agree that they've been improving.

As one on that side of that argument, I have to say I have yet to see LLMs fundamentally improve, rather than being benchmaxxed on a new set of common "trick questions" and giving off the illusion of reasoning.

Add an extra leg to any animal in a picture. Ask the vision LLM to tell you how many legs it sees. It will answer the same amount as a person would expect from a healthy individual, because it's not actually reasoning, it's not perceiving anything, it's pattern matching. It sees dog, it answers 4 legs. Maybe sometime in the future it won't do that, because they will add this kind of trick to their benchmaxxing set (training LLMs specifically on pictures that have less or more legs than the animal should), as they do every time there's a new generation of those illusory things. But that won't fix the fundamental that these things DO NOT REASON.

Training LLMs on sets of thousands and thousands and thousands of reasoning trick questions people ask on LM arena is borderline scamming people on the true nature of this technology. If we lived in a sane regulatory environment OAI would have a lot to answer for.


Someone is going to make a new OS based around LLMs soon. It might end up being mobile but desktop could work. Voice and tool calling are close to being good enough.

It's still too early but at some point we are going to start to see infra and frameworks designed to be easier for LLMs to use. Like a version of terraform intended for AI. Or an edition of the AWS api for LLMs.


For factory optimization they didn't ask people to track their hours at 7 minute intervals. They put a camera on them and analyzed what they did over an entire shift. Then they redesign the factory line to optimize things. Having people manually track time has far more overhead and is less accurate in general. In practice I've never seen a software org take estimates or time tracking seriously beyond "Track your hours in 15 min increments against tickets".


I will have this written on my gravestone if Edward Tufte doesn’t beat me to it:

Charts are for asking questions, not answering them.

That analysis phase can be done brilliantly, or disastrously. It’s down to whose hand is on the rudder.


7.5 would be nicer as it fits into sixty minutes!


I'm getting claude code to one shot deployment scripts. It's pretty great.


I use Claude Code for all of my interactions with Claude now. It's a way better experience.


I've spent far too much of my life writing boilerplate and API integrations. Let Claude do it.


I agree. It’s a lot faster to tell it what I want and work on something else in the meantime. You end up ready code diffs more than writing code but it saves time.


I have Simon Willison's blog in my RSS feed. https://simonwillison.net


+1 Pretty much the best source for tech laypeople.

How do I stay on top of AI tech? I let other people (like simonw) do most of the leg work for me.


Yes, that, and AI Explained [1] on YouTube - no BS hype, strong yet approachable analysis, and an in-depth private community if you want a deeper dive.

https://www.youtube.com/@aiexplained-official


thanks for sharing - subscribed!


I recently used claude code and go to build a metric receiving service for a weather station project. I had it build all the code for this. It created the SQL statements, the handlers, the deployment scripts, the systemd service files, the tests, a utility to generate api keys.

I think you are using the wrong language to be honest. LLMs are best at languages like Python, Javascript and Go. Relatively simple structures and huge amounts of reference code. Rust is a less common language which is much harder to write.

Did you give claude code tests and the ability to compile in a loop? It's pretty good in go at least at debugging and fixing issues when allowed to loop.


How do you give cc the ability to compile in a loop?


simply tell it what command it should call to compile, or better yet add that info to Claude.md


Leetcode questions are still the primary way to test skill in interviews.


Where? I have candidates solve a real closed-ended problem in the space we’re working in. I also give them a lot of source code to read and respond to and find issues with.


But this approach takes a certain amount of time and effort, and requires interviewers who are skilled at interviewing.

Leetcode provides an extremely lazy and cheap way of “solving” the problem of assessing an applicant’s skills.


most medium to large size companies


Plants not being able to chew or tear their prey is a big disadvantage.


Not if you're prey. i'd rather not have more stuff trying to eat me :P


Baleen whales seem to do just fine without it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: