> They can get halfway there and then struggle immensely.
Restart the conversation from scratch. As soon as you get something incorrect, begin from the beginning.
It seems to me like any mistake in a messages chain/conversation instantly poisons the output afterwards, even if you try to "correct" it.
So if something was wrong at one point, you need to go back to the initial message, and adjust it to clarify the prompt enough so it doesn't make that same mistake again, and regenerate the conversation from there on.
Chatbot UIs really need better support for conversation branching all around. It's very handy to be able to just right-click on any random message in the conversation in LM Studio and say, "branch from here".
Maybe it's contrarian, maybe it's not, but I don't think Chat UIs are well suited for software engineering/programming at all, we need something completely different. Being able to branch conversations and such would be useful, but probably not for the way I do software. Besides, I'm rarely beyond 3 messages (1 system, 1 user, 1 assistant) in any usage of the chat UIs. Maybe it's more useful to people with different workflows.
I don't see how you'd avoid using chat if you need the bot to work on some bug end-to-end. I usually have many rounds in a chat session, first asking it to identify the overall approach, reviewing and approving that, then one or more rounds for coding, and several more to request edits as needed.
If you only ever ask it for trivial changes that don't require past context to make sense, then chat is indeed overkill. But we already have different UX approaches for that - e.g. some IDEs watch for specially formatted comments to trigger code generation, so you literally just type what you want right there in the editor, exactly where you want the code to go.
Yeah, I'd agree you want to iterate, but I'm not sure the UX of "Log of messages, where some of yours, some are tool calls, others are the assistant" and the workflow of "Add more messages into the log of messages"/"Change existing messages" is the right broad UX for this type of work.
I'm sorry I can't substantiate it more than that, as my own head is still trying to wrap itself around what I think is needed instead. Still, sounds very "fluffy" even when I read it back myself.
It does indeed. What I'm saying is that, for some mysterious reason, none of the first-party chatbot apps do that - ChatGPT, Claude, Gemini all lack this feature.
AI Studio has this, I usually ask it to plan and I do some rounds of refining until the plan covers all my requirements, then I branch this conversation, a branch for each feature, none of the branches get polluted this way.
Can you imagine if Excel worked like this? the formula put out the wrong result, so try again! It's like that scene from The Office where Michael has an accountant "run it again." It's farcical. They have created computers that are bad at math and I will never forgive them.
Also, each try costs money! You're pulling the lever on a god damned slot machine!
I will TRY AGAIN with the same prompt when I start getting a refund for my wasted money and time when the model outputs bullshit, otherwise this is all confirmation and sunk cost bias talking, I'm sure if it.
I mean, why would I imagine that? Who would want that? It's like the argument against legal marijuana, and someone replies "But would you like your pilot to be high when flying?!". Right tool for the right job, clearly when you want 100% certainty then LLMs aren't the tool for that. Just because they're useful for some things don't mean we have to replace everything with them.
> Also, each try costs money!
I guess you're using some paid API? Try a different way then. I mostly use the web UI from OpenAI, or Codex lately, or ran locally with my own agent using local weights, neither is "each try costs money" more than writing data to my SSD is costing me money.
It's not a holy grail some people paint it, and not sure we're across the "productivity threshold" (https://news.ycombinator.com/item?id=44160664) yet, but it's worth trying it out probably before jumping to conclusions. But no one is forcing you either, YMMV and all that.
I thought Claude still has a problem generating the same output for the same input? That you can't just rewind and rerun and get to the same point again.
> I thought Claude still has a problem generating the same output for the same input?
I haven't used Anthropic's models/software in a long time (months, basically forever in AI ecosystem), so don't know exactly how it works now.
But last time I used Claude, you could edit the first message, and then re-generate the assistants next message based on your edit. Most of the LLM interfaces has one or another way of doing this, I can't imagine they got rid of that feature.
What I'm suggesting isn't to use the exact same input (the first message), but rather change it so you remove the chances of something incorrect happening later after that.
Good engineering? You want automated steps to be repeatable so you know your tweak to the previous conversation have the effect you desire. Though using an AI for coding is probably closer in spirit the the art of writing code than the engineering of writing code and art is pretty much unrepeatable by definition.
Fair enough. Use the respective API or Google Gemini which will let you set temperature to zero resulting in deterministic output barring FP errors accumulating when paired with non-standard GPU/TPU configurations. Likely not to differ by much in the vast majority of cases though.
Restart the conversation from scratch. As soon as you get something incorrect, begin from the beginning.
It seems to me like any mistake in a messages chain/conversation instantly poisons the output afterwards, even if you try to "correct" it.
So if something was wrong at one point, you need to go back to the initial message, and adjust it to clarify the prompt enough so it doesn't make that same mistake again, and regenerate the conversation from there on.