>My limited testing seems to show LLM assistants do well the more popular the la...

>My limited testing seems to show LLM assistants do well the more popular the language is (more data in its training), so is the hurdle for adoption of something new going to get even higher

Not only that they also tend to answer using the the more popular languages or tool event when it is NOT necessary. And when you call it out on it, it will respond with something like:

"you are absolutely right, this is not necessary and potentially confusing. Let me provide you with a cleaner, more appropriate setup...."

Why doesn't it just respond that the first time? And the code it provided works, but very convoluted. if wasn't checked carefully by an experienced dev person to ask the right question one would never get the second answer, and then that vibe code will just end up in git repo and deployed all over the place.

Got the feeling some big corp may just paid some money to have their plugin/code to on the first answer even when it is NOT necessary.

This could be very problematic, I'm sure people in advertising are just all licking their chops on how they can capitalized on that. If one thing currently ad industry is bad, wait until that is infused into all the models.

We really need ways to

1. Train our own models in the open, with weight and the data it is trained on. Kinda like the reproducible built process that Nix is doing for building repos.

2. Ways to debug the model on inference time. The <think> tag is great, and I suspect not everything is transparent in that process.

Is there something equivalent of formal verification for model inference?