Cursor was good for a little while until VSCode opened up the APIs for AI editing. Now Copilot is really good and other extensions (specifically Kilo Code) are doing things so much better!
I am seeing a lot of folks talking about maintaining a good "Agent Loop" for doing larger tasks. It seems like Kilo Code has figured it out completely for me. Using the Orchestrator mode I'm able to accomplish really big and complex tasks without having to design an agent loop or hand crafting context. It switches between modes and accomplishes the tasks. My AGENTS.md file is really minimal like "write test for changes and make small commits"
I feel like I've hit a sweet spot for my use case, but am so behind the times. I've been a developer for 20 years and I'm not interested in vibe coding or letting an agent run wild on my full code base.
Instead, I'll ask Cursor to refactor code that I know is inefficient. Abstract repetitive code into functions or includes. Recommend (but not make) changes to larger code blocks or modules to make them better. Occasionally, I'll have it author new functionality.
What I find is, Cursor's autocomplete pairs really with with the agent's context. So, even if I only ask it for suggestions and tell it to not make the change, when I start implementing those changes myself (either some or all), the shared context kicks in and autocomplete starts providing suggestions in the direction of the recommendation.
However, at any time I can change course and Cursor picks up very quickly on my new direction and the autocomplete shifts with me.
It's so powerful when I'm leading it to where I know I want to go, but having enormous amounts of training data at the ready to guide me in best-practices or common patterns.
I don't run any .md files though. I wonder what I'm missing out on.
Abstraction for abstraction sake is usually bad. What you should aim for is aligning it to the domain so that feature change requests are proportional to the work that needs to be done. Small changes, small PRs.
Did something change with Kiro, or was I just using it wrong? I tried to have it make a simple MCP server based on docs, and it seriously spent 6 hours without making a basic MVP. It looked like the most impressive planner and executor while working, but it just made a mess.
I never understood the point of the pellican on a bicycle exercise:
LLMs coding agent doesnt have any way to see the output.
It means the only thing this test is testing, is the ability of the LLMs to memorise.
Because it excercises thinking about a pelican riding a bike (not common) and then describing that using SVG. It's quite nice imho and seems to scale with the power of the LLM model. Sure Simon has some actual reasons though.
I wouldn't say any LLMs are good at it. But it doesn't really matter, it's not a serious thing. It's the equivalent of "hello world" - or whatever your personal "hello world" is - whenever you get your hands on a new language.
Coordinate and shape of the element used to form a pellican.
If you think about how LLMs ingest their data, they have no way to know how to form a pellican in SVG.
I bet their ability to form a pellican result purely because someone already did it before.
> If you think about how LLMs ingest their data, they have no way to know how to form a pellican in SVG.
It's called generalization and yes, they do. I bet you could find plenty of examples of it working on something that truly isn't "present in the training data".
It's funny, you're so convinced that it's not possible without direct memorization but forgot to account for emergent behaviors (which are frankly all over the place in LLM's - where you been)?
At any rate, the pelican thing from simonw is clearly just for fun at this point.
Assuming they updated the crawled training data, just having a bunch of examples of specifically pelicans on bicycles from other models is likely to make a difference.
But then how does the quality increase? Normally we hear that when models are trained on the output of other models the style becomes very muted and various other issues start to appear. But this probably the best pelicans on a bicycle I've ever seen, by quite some margin.
I thought a human would be a considerable step up in complexity but I asked it first for a pelican[0] and then for a rat [1] to get out of the bird world and it did a great job on both.
But just fot thrills I also asked for a "punk rocker"[2] and the result--while not perfect--is leaps and bounds above anything from the last generation.
0 -- ok, here's the first hurdle! It's giving me "something went wrong" when I try to get a share link on any of my artifacts. So for now it'll have to be a "trust me bro" and I'll try to edit this comment soon.
Price is playing a big role in my AI usage for coding. I am using Grok Code Fast as it's super cheap. Next to it GPT-5 Codex. If you are paying for model use out of pocket Claude prices are super expensive. With better tooling setup those less smart (and often faster) models can give you better results.
I am going to give this another shot but it will cost me $50 just to try it on a real project :(
Same here. I've been using GCF1 with opencode and getting good results. I also started using [Serena](https://github.com/oraios/serena), which has been really helpful in a large codebase. It gives you better search than plain grep, so you can quickly find what you need instead of dumping huge chunks of code into Claude or Grok and wasting tokens.
I really struggle to see the usecase of Grok Code Fast when you have Qwen 3 Coder right there providing much better outputs while still being fast and cheap.
I'm paying $90(?) a month for the Max and it holds up for about an hour or so of in depth coding before it kicks in the 5-hour window lockout (so effectively about 4 hours of time when I can't run it). Kinda frustrating, even with efficient prompt and context length conservation techniques. I'm going to test this new sonnet 4.5, now but it'll probably be just as quick to gobble my credits.
You have got to have some extremely large files or something. Even with only Opus, running into the limits with the Max subscription is almost impossible unless you really try.
I'm too cheap to pay for any of them. I've only tried gpt-oss:20b because I can run it locally and it's a complete waste of time for anything except code completions.
This was such a pleasure to read! Thank you for sharing!
My understanding is that solvers are like regexes. They can easily get out of hand in runtime complexity. At least this is what I have experienced from iOS's AutoLayout solver
Any tool that can solver hard problems will also have non-trivial runtime behavior. That is an unfortunate fact. But you are also correct in that combinatorial optimizaton solvers (CP, SAT, SMT, MIP, ...) often have quite sharp edges that are non-intuitive.
For the iOS AutoLayout, what kind of issues have you seen, and how complex were the problems?
I'm furnishing a new apartment and Nano Banana has been super useful for placing furniture I want to purchase in rooms to make a judgment if things will work for us or not. Take a picture of the room, feed Nano Banana with that picture and the product picture and ask it to place it in the right location. It can even imagine things at night or even add lamps with lights on. Super useful!
npm should take responsibility and up their game here. It’s possible to analyze the code and mark it as suspicious and delay the publish for stuff like this. It should prevent publishing code like this even if I have a gun to my head
Why would npm care? They're basically a monopoly in the JS world and under the stewardship of a company that doesn't even care when its host nation gets hacked when using their software due to their ineptitude.
> but provide some kind of 'verified' badge to the package
I would worry that that results in a false sense of security. Even if the actual badge says "passes some heuristics that catch only the most obvious malicious code", many people will read "totally 100% safe, please use with reckless abandon".
I always thought this would be the ideal monetization path for NPM; enterprises pay them, NPM only supplies verified package releases, ideally delayed by hours/days after release so that anything that slips through the cracks has a chance to get caught.
Absolutely not. you get npm packages by pulling not them pushing them to you as soon as a new version exist. The likelyhood of you updating instantly is close to zero and if not, you should set your stuff up so that it is. Many ways to do that.
Even better if compared to a month or two - which is how long it often takes for a researcher to find a carefully planted malware.
Anyway, the case where reactive tools (detections, warnings) don't catch it is why LavaMoat exists. It prevents whole classes of malware from working at runtime.
The article (and repo) demonstrates that.
Sure, it should never happen in CI environment. But I bet that every second, someone in the world is running "npm install" to bring in a new dependency to a new/existing project, and the impact of a malicious release can be broad very quickly. Vibe coding is not going to slow this down.
Vibe coding brings up the need for even more granular isolation. I'm on it ;)
LavaMoat Webpack Plugin will soom have the ability to treat parts of your app same as it currently treats packages - with isolation and policy limiting what they can do.
I've worked in software supply chain security for two years now and this is an extremely optimistic take. Nearly all organizations are not even remotely close to this level of responsiveness.
They do, I use a yubikey and it requires me to authenticate with it whenever I publish. They do support weaker 2fa methods as well, but you can choose.
reply