> Anytime more than a class or two is involved or if the code base is more than 20 or 30 files, then even the best LLMs start to stray and lose focus. They can't seem to keep a train of thought which leads to churning way too much code.
At least if looking at this specific portion: You can have a 20-30 file code base, with 5-10 classes in the context, in full, and the rest of them filtered in a sensible way. Even a model with a 200k context window can handle this.
The output definitely can stray, but it's not the norm in my experience. Of course, if the output does start to stray, it needs to be snipped in the bud. And the fixes can range anywhere from working but bad code, to very close how you'd written it yourself, if you've clearly described how you want the code to be written.
If you're trying to fix a specific bug, for example, but don't provide thorough logs on what is happening in the code, it's much more likely the output will stray towards some average of what the problem could be, rather than what it actually is in the current code.
And this is absolutely not to say that LLMs could do what Antirez is doing. There is a massive amount of variation in how deeply people think about the code they're reading or writing.
Yeah, I’m saying that I expect us to have a world where we can cram almost all business knowledge into the context window for coding LLMs and we can get an early glimpse of that today by pasting in the full contents of 50+ files into Gemini 2.5 and only using 10% of its context window today, which is the worst it’ll ever be.
I paste in the entire codebase for my small ETL project (100k tokens) and it’s pretty good.
Not perfect, still a long ways to go, but a sign of the times to come.