Adding extra structural information about the codebase is an avenue we're actively exploring. Agentic exploration is a structure-aware system where you're using a frontier model (Claude 4 Sonnet or equivalent) that gives you an implicit binary relevance score based on whatever you're putting into context -- filenames, graph structures, etc.
If a file is "relevant" the agent looks at it and decides if it should keep it in context or not. This process repeats until there's satisfactory context to make changes to the codebase.
The question is whether we actually need a 200b+ parameter model to do this or if we can distill the functionality onto a much smaller, more economical model. A lot of people are already choosing to do it with Gemeni (due to the 1m context window), and they write the code with Claude 4 Sonnet.
Ideally, we want to be able to run this process cheaply in parallel to get really fast generations. That's the ultimate goal we're aiming towards
If a file is "relevant" the agent looks at it and decides if it should keep it in context or not. This process repeats until there's satisfactory context to make changes to the codebase.
The question is whether we actually need a 200b+ parameter model to do this or if we can distill the functionality onto a much smaller, more economical model. A lot of people are already choosing to do it with Gemeni (due to the 1m context window), and they write the code with Claude 4 Sonnet.
Ideally, we want to be able to run this process cheaply in parallel to get really fast generations. That's the ultimate goal we're aiming towards