Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you do a lot of work in an area that o1 is strong in - $200/month effectively rounds down to $0 - and a single good answer at the right time could justify that entire $200 in a single go.


I feel like a single bad answer at the wrong time could cost a heck of a lot more than $200. And these LLMs are riddled with bad answers.


Think of it as an intern. Don't trust everything they say.


It's so strange to me that in a forum full of programmers, people don't seem to understand that you set up systems to detect errors before they cause problems. That's why I find ChatGPT so useful for helping me with programming - I can tell if it makes a mistake because... the code doesn't do what I want it to do. I already have testing and linting set up to catch my own mistakes, and those things also catch AI's mistakes.


Thank you! I always feel so weird to actually use chatgpt without any major issues while so many people keep on claiming how awful it is; it's like people want it 100% perfect or nothing. For me if it gets me 80% there in 1/10 the time, and then I do the final 20%, that's still heck of a productivity boost basically for free.


Yep, I’m with you. I’m a solo dev who never went to college… o1 makes far fewer errors than I do! No chance I’d make it past round one of any sort of coding tournament. But I managed to bootstrap a whole saas company doing all the coding myself, which involved setting up a lot of guard rails to catch my own mistakes before they reached production. And now I can consult with a programming intelligence the likes of which I could never afford to hire if it was a person. It’s amazing.


Is it working?


Not sure what you're referring to exactly. But broadly yes it is working for me - the number of new features I get out to users has sped up greatly, and stability of my product has also gone up.


Are you making money with your saas idea?


Yep, been living off it for nine years now


Congratulations! That is not an easy task. I am just starting the journey.


Famously, the last 10% takes 90% of the time (or 20/80 in some approximations). So even if it gets you 80% of the way in 10% of the time, maybe you don’t end up saving any time, because all the time is in the last 20%.

I’m not saying that LLMs can’t be useful, but I do think it’s a darn shame that we’ve given up on creating tools that deterministically perform a task. We know we make mistakes and take a long time to do things. And so we developed tools to decrease our fallibility to zero, or to allow us to achieve the same output faster. But that technology needs to be reliable; and pushing the envelope of that reliability has been a cornerstone of human innovation since time immemorial. Except here, with the “AI” craze, where we have abandoned that pursuit. As the saying goes, “to err is human”; the 21st-century update will seemingly be, “and it’s okay if technology errs too”. If any other foundational technology had this issue, it would be sitting unused on a shelf.

What if your compiler only generated the right code 99% of the time? Or, if your car only started 9 times out of 10? All of these tools can be useful, but when we are so accepting of a lack of reliability, more things go wrong, and potentially at larger and larger scales and magnitudes. When (if some folks are to believed) AI is writing safety-critical code for an early-warning system, or deciding when to use bombs, or designing and validating drugs, what failure rate is tolerable?


> Famously, the last 10% takes 90% of the time (or 20/80 in some approximations). So even if it gets you 80% of the way in 10% of the time, maybe you don’t end up saving any time, because all the time is in the last 20%.

This does not follow. By your own assumptions, getting you 80% of the way there in 10% of the time would save you 18% of the overall time, if the first 80% typically takes 20% of the time. 18% time reduction in a given task is still an incredibly massive optimization that's easily worth $200/month for a professional.


Using 90/10 split: that 10% of the time before being reduced to only take 10% of that makes 9% time savings.

160 hours a month * $100/hr programmer * 9% = $1400 savings, easily enough to justify $200/month.

Even if 1/10th of the time it fails, that is still ~8% or $1200 savings.


Does that count the time you spend on prompt engineering?


It depends what you’re doing.

For tasks where bullshitting or regurgitating common idioms is key, it works rather well and indeed takes you 80% or even close to 100% of the way there. For tasks that require technical precision and genuine originality, it’s hopeless.


I'd love to hear what that is.

So far, given my range of projects, I have seen it struggle with lower level mobile stuff and hardware (ESP32 + BLE + HID).

For things like web (front/back), DB, video games (web and Unity), it does work pretty well (at least 80% there on average).

And I'm talking of the free version, not this $200/mo one.


Well, that is a very specific set of skills. I bet the C-suite loves it.


I always feel so weird to actually use chatgpt without any major issues while so many people keep on claiming how awful it is;

People around here feel seriously threatened by ML models. It makes no sense, but then, neither does defending the Luddites, and people around here do that, too.


Well now at $200 it's a little farther away from free :P


What do you mean? ChatGPT is free, the Pro version isn't.

I'm talking of the generally available one, haven't had the chance to try this new version.


I could a car for that kind of money!


Of course, but for every thoroughly set up TDD environment, you have a hundred other people just blindly copy pasting LLM output into their code base and trusting the code based on a few quick sanity checks.


You assume programming software with an existing well-defined and correct test suite is all these will be used for.


>I can tell if it makes a mistake because... the code doesn't do what I want it to do

Sometimes it does what you want it to do, but still creates a bug.

Asked the AI to write some code to get a list of all objects in an S3 bucket. It wrote some code that worked, but it did not address the fact that S3 delivers objects in pages of max 1000 items, so if the bucket contained less than 1000 objects (typical when first starting a project), things worked, but if the bucket contained more than 1000 objects (easy to do on S3 in a short amount of time), then that would be a subtle but important bug.

Someone not already intimately familiar with the inner workings of S3 APIs would not have caught this. It's anyone's guess if it would be caught in a code review, if a code review is even done.

I don't ask the AI to do anything complicated at all, the most I trust it with is writing console.log statements, which it is pretty good at predicting, but still not perfect.


So the AI wrote a bug; but if humans wouldn’t catch it in code review, then obviously they could have written the same bug. Which shouldn’t be surprising because LLMs didn’t invent the concept of bugs.

I use LLMs maybe a few times a month but I don’t really follow this argument against them.


Code reviewing is not the same thing as writing code. When you're writing code you're supposed to look at the documentation and do some exploration before the final code is pushed.

It would be pretty easy for most code reviewers to miss this type of bug in a code review, because they aren't always looking for that kind of bug, they aren't always looking at the AWS documentation while reviewing the code.

Yes, people could also make the same error, but at least they have a chance at understanding the documentation and limits where the LLM has no such ability to reason and understand consequences.


it also catches MY mistakes, so that saves time


So true, and people seem to gloss over this fact completely. They only talk about correcting the LLM's code while the opposite is much more common for me.


I would hesitate to hire an intern that makes incorrect statements with maximum confidence and with no ability to learn from their mistakes.


When you highlight only the negatives, yeah it does sound like no one should hire that intern. But what if the same intern happens to have an encyclopedia for a brain and can pour through massive documents and codebases to spot and fix countless human errors in a snap?

There seems to be two camps: People who want nothing to do with such flawed interns - and people who are trying to figure out how to amplify and utilize the positive aspects of such flawed, yet powerful interns. I'm choosing to be in the latter camp.


Those are fair points, I didn't mean to imply that there are only negatives, and I don't consider myself to be in the former camp you describe as wanting nothing to do with these "interns". I shouldn't have stuck with the intern analogy at all since it's difficult for me to compare the two, with one being fairly autonomous and the other being totally reliant on a prompter.

The only point I wanted to make was that an LLM's ability and propensity to generate plausible falsehoods should, in my opinion, elicit a much deeper sense of distrust than one feels for an intern, enough so that comparing the two feels a little dangerous. I don't trust an intern to be right about everything, but I trust them to be self aware, and I don't feel like I have to take a magnifying glass to every tidbit of information they provide.


nothing chatgpt says is with maximum confidence. the EULA and terms of use are riddled with "no guarantee of accuracy" and "use at own risk"


No they're right. ChatGPT (and all chargers) responds confidently while making simple errors. Disclaimers upon signup or in tiny corner text are so at odds with the actual chat experience.


What I meant to say was that the model uses the verbiage of a maximally confident human. In my experience the interns worth having have some sense of the limits of their knowledge and will tell you "I don't know" or qualify information with "I'm not certain, but..."

If an intern set their Slack status to "There's no guarantee that what I say will be accurate, engage with me at your own risk." That wouldn't excuse their attempts to answer every question as if they wrote the book on the subject.


I think the point is that an LLM almost always responds with the appearance of high confidence. It will much quicker hallucinate than say "I don't know."


And we, as humans, are having a hard time compartmentalizing and forgetting our lifetimes of language cues, which typically correlate with attention to detail, intelligence, time investment, etc.

New echnology allows those signs to be counterfeited quickly and cheaply, and it tricks our subconscious despite our best efforts to be hyper-vigilant. (Our brains don't want to do that, it's expensive.)

Perhaps a stopgap might be to make the LLM say everything in a hostile villainous way...


They aren't talking about EULAs. It's how they give out their answers.


If I have to do the work to double-check all the answers, why am I paying $200?


Why do companies hire junior devs? You still want a senior to review the PRs before they merge into the product right? But the net benefit is still there.


We hire junior devs as an investment, because at some point they turn into seniors. If they stayed juniors forever, I wouldn't hire them.


I started incorporating LLMs into my workflows around the time gpt-3 came out. By comparison to its performance at that point, it sure feels like my junior is starting to become a senior.


Are you implying this technology will remain static in its capabilities going forward despite it having seen significant improvement over the last few years?


No, I'm explicitly saying that gpt-4o-2024-11-20 won't get any smarter, no matter how much I use it.


Does that matter when you can just swap it for gpt-5-whatever at some point in the future?


Someone asked why I hire juniors. I said I hire juniors because they get better. I don't need to use the model for it to get better, I can just wait until it's good and use it then. That's the argument.


I suppose the counterargument would be your investment in OpenAI allows them to fund the better model down the road, but I get your drift :)


Genuinely curious, are you saying that your junior devs don't provide any value from the work they do?


They provide some value, but between the time they take in coaching, reviewing their work, support, etc, I'm fairly sure one senior developer has a much higher work per dollar ratio than the junior.


Because double checking and occasionally hitting retry is still 10x faster than me doing.


Because you wouldn't have come up with the correct answer before you used up 200 dollars worth of salary or billable time.


because checking the work is much faster than generating it.


Because it's per month and not per hour for a specialist consultant.


I don't know anyone who does something and at first says, "This will be a mistake" Maybe they say, "I am pretty sure this is the right thing to do," then they make a mistake.

If it's easier mentally, just put that second sentence in from of every chatgpt answer.

Yeah the Junior dev gets better, but then you hire another one that makes the same mistakes, so in reality, on an absolute basis, the junior dev never gets any better.


Yeah, but you personally don't pay $200/month out of your pocket for the intern. Heck in Canada, govt. actually rebates for hiring interns and co-ops.


Then the lesson you have learned is “don’t blindly trust the machine”

Which is a very valuable lesson, worth more than $200


Easy - don't trust the answers. Verify them


Even in this case loosing $200 + whatever vs a tiny bit higher chance of loosing $20 + whatever makes pro seem a good deal.


Doesn't that completely depend on those chances and the magnitude of +whatever?

It just seems to me that you really need to know the answer before you ask it to be over 90% confident in the answer. And the more convincing sounding these things get the more difficult it is to know whether you have a plausible but wrong answer (aka "hallucination") vs a correct one.

If you have a need for a lot of difficult to come up with but easy to verify answers it could be worth it. But the difficult to come up with answers (eg novel research) are also where LLMs do the worst.


Compared to know things and not loosing whatever, both are pretty bad deals.


What specific use cases are you referring to where that poses a risk? I've been using LLMs for years now (both directly and as part of applications) and can't think of a single instance where the output constituted a risk or where it was relied upon for critical decisions.


That's why you have a human in the loop responsible for the answer.


Presumably, this is what they want the marks buying the $200 plan to think. Whether it's actually capable of providing answers worth $200 and not just sweet talking is the whole question.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: