I think there is a common problem with a lot of these ML systems. The answers look perfectly correct to someone who isn't a domain expert. For example, I ask legal questions and it gives me fake case numbers I have no way to know are fake until I look them up. Same with the coding, I asked for a patch for a public project that has a custom !regex style match engine. It does an amazing job, cross referencing two different projects and hands me a very probable looking patch. I ask for a couple changes, one of which can't actually be done, but it creates some syntax that doesn't even compile because its using 'x' as a stand-in for the bits it doesn't have an answer for.
In the end, I had to go spend a couple hours reading the documentation to understand the matching engine, and the final patch didn't look anything at all like the LLM generated code. Which is what seems to happen all the time, it is wonderful for spewing the boilerplate, the actual problem solving portions its like talking to someone who simply doesn't understand the problem and keeps giving you what it has, rather than what you want.
OTOH, its fantastic for review/etc even though I tend to ignore many of the suggestions. Its like a grammar checker of old, it will point out you need a comma you missed, but half the time the suggestions are wrong.
In the end, I had to go spend a couple hours reading the documentation to understand the matching engine, and the final patch didn't look anything at all like the LLM generated code. Which is what seems to happen all the time, it is wonderful for spewing the boilerplate, the actual problem solving portions its like talking to someone who simply doesn't understand the problem and keeps giving you what it has, rather than what you want.
OTOH, its fantastic for review/etc even though I tend to ignore many of the suggestions. Its like a grammar checker of old, it will point out you need a comma you missed, but half the time the suggestions are wrong.