Honestly I feel AI has helped me be a better craftsman.
I can think "oh it would be kinda nice to add this little tidbit of functionality in this code". Previously I'd have to spend loads of time googling around the question in various ways, wording it differently etc, reading examples or digging a lot for info. This would sometimes just not be worth adding that little feature or nice to have.
Now, I can have Claude help me write some code, then ask it about various things I can add or modify it with or maybe try it differently. It gives me more time to spend figuring out the best thing for the user. More various ideas for how to tackle a problem.
I don't let it code willy nilly. I'm fairly precise in what I ask it to do and that's only after I get it to explain how it would go about tackling a problem.
I still write my own code. What I have, are a couple of LLM subscriptions (Perplexity and ChatGPT), that I regularly consult. I now ask them even the “silliest” questions.
The last couple of days, I tried having ChatGPT basically write an app for me, but it didn’t really turn out so well.
I look forward to being able to train an agent on my personal technique and style, as well as Quality bar, and have it write a lot of my code.
The question is: can an LLM actually power a true "agent" or can it just create a pretty decent simulation of one? When your tools are a bigger context window and a better prompt, are there some nails that are out of your capacity to hit?
We have made LLMs that need far less "prompt engineering" to give you something pretty-decent than they did 2 years ago. It makes them WAY more useful as tools.
But then you hit the wall like you mention, or like another poster on this thread saw: "Of course, it's not perfect. For example, it gave me some authentication code that just didn’t work." This happens to me basically daily. And then I give it the error and ask it to modify. And then that doesn't work. And I give it this error. And it suggests the previous failed attempt again.
It's often still 90% of the way there, though, so the tool is pretty valuable.
But is "training on your personal quality bar" achievable? Is there enough high-quality draining data in the world, that it can recognize as high-quality vs low? Are the fundamentals of the prediction machine the right ones to be able to understand at-generation-time "this is not the right approach for this problem" given the huge variety and complexity in so many different programming languages and libraries?
TBD. But I'm a skeptic about that because I've seen "output from a given prompt" improve a ton in 2 years, but I haven't seen that same level of improvement for "output after getting a really really good prompt and some refinement instructions". I have to babysit it less, so I actually use it day to day way more, but it hits the wall in the same sort of very similar, unsurprising ways. (It's harder to describe than that - it's like a "know it when you see it" thing. "Ah, yes, there's a subtly that it doesn't know how to get past because there are so many wrinkles in a particular OAUTH2 implementation, but it was so rare a case in the docs and examples that it's just looping on things that aren't working.")
(The personification of these things really fucks up the discussion. For instance, when someone tells me "no, it was probably just too lazy to figure out the right way" or "it got tired of the conversation." The chosen user-interface of the people making these tools really messes with people's perceptions of them. E.g. if LLM-suggested code that is presented as an in-line autocomplete by Copilot is wrong, people tend to be more like "ah, Copilot's not always that great, it got it wrong" but if someone asks a chatbot instead then they're much more likely to personify the outcome.)
I really don’t think training an agent on “your style” is the future. We’re more adaptable than the agents.
I think programming is a job people don’t need to do anymore and anyone who called themselves a software engineer is now a manager of agents. Jira is the interface. Define the requirements.
Writing your own code will still be a thing. We’ll even call those people hackers. But it will be a hobby.
> I think programming is a job people don’t need to do anymore and anyone who called themselves a software engineer is now a manager of agents. Jira is the interface. Define the requirements.
That Grand Canyon sized logical leap quoted ignores a vital concept: understanding.
To "define the requirements" sufficient enough for any given feature/defect/etc. requires a degree of formality not present in prose. This already exists and is in use presently:
> I can think "oh it would be kinda nice to add this little tidbit of functionality in this code". Previously I'd have to spend loads of time googling around the question in various ways, wording it differently etc, reading examples or digging a lot for info.
Research is how people learn. Or to learn requires research. Either way one wants to phrase it, the result is the same.
> Now, I can have Claude help me write some code, then ask it about various things I can add or modify it with or maybe try it differently.
LLM's are statistical text (token) generators and highly sensitive to the the prompt given. More importantly in this context is the effort once expended by a person doing research is at best an exercise in prompt refinement (if the person understands the problem context) or at worst an outsourcing of understanding (if they do not).
> I'm fairly precise in what I ask it to do and that's only after I get it to explain how it would go about tackling a problem.
Again, LLM algorithms strictly output statistically generated text derived from the prompt given.
LLM's do not "explain", as that implies understanding.
They do not "understand how it would go about tackling a problem", as that is a form of anthropomorphization.
We can go on all day about how an LLM doesn't explain and doesn't actually think. In the end though, I've found myself being able to do things better and faster especially given a codebase I have no experience in with developers who aren't able to help me in the moment given our timezone differences
> We can go on all day about how an LLM doesn't explain and doesn't actually think.
This is an important concept IMHO. Maintaining a clear understanding of what a tool is useful for and what it is not allows for appropriate use.
> In the end though, I've found myself being able to do things better and faster especially given a codebase I have no experience in ...
Here, I have to reference what I wrote before:
Research is how people learn. Or to learn requires
research. Either way one wants to phrase it, the
result is the same.
If you don't mind me asking two philosophic questions;
How can one be confident altering a codebase one has no experience with will become "better" without understanding it?
Knowing an LLM produces the most statistically relevant response to any given query, which is orthogonal to the concepts of true/false/right/wrong/etc., and also knowing one has no experience with a codebase, how can one be confident whatever the LLM responds with is relevant/correct/useful?
The thing about code is you can run it to confirm it does what you want it to do and doesn't do what you don't want it to do. Sprinkle in some software experience in there as well.
I'm a .NET developer working on backend for the last 7 years. I used to work with WinForms, WebForms, and ASP.NET MVC 5. Lately, I've been wanting to get back into frontend development using Blazor.
I have the GitHub Copilot for $10, and I’ve been "vibe coding" with it. I asked the AI to give me an example of a combo box, and it gave me something to start with. I used a Bootstrap table and asked the AI to help with pagination, and it provided workable code. of course to seasoned frontend developer, what i'm doing is simple but i haven't work on front end for so long, vibing with AI have been a good expereinces.
Of course, it's not perfect. For example, it gave me some authentication code that just didn’t work. Still, I’ve found AI to be like a “smart” Google, it doesn’t judge me or tell me my question is a duplicated like Stack Overflow.
Now, I can have Claude help me write some code, then ask it about various things I can add or modify it with or maybe try it differently. It gives me more time to spend figuring out the best thing for the user. More various ideas for how to tackle a problem.
I don't let it code willy nilly. I'm fairly precise in what I ask it to do and that's only after I get it to explain how it would go about tackling a problem.