The diffusion approach is really interesting -- it's something we haven't checked out for applying edits just yet. It could work quite well though!
You can definitely use it for markdown, but we haven't seen anyone test it for plaintext yet. I'm sure it would work though, let us know if you end up trying it!
Adding extra structural information about the codebase is an avenue we're actively exploring. Agentic exploration is a structure-aware system where you're using a frontier model (Claude 4 Sonnet or equivalent) that gives you an implicit binary relevance score based on whatever you're putting into context -- filenames, graph structures, etc.
If a file is "relevant" the agent looks at it and decides if it should keep it in context or not. This process repeats until there's satisfactory context to make changes to the codebase.
The question is whether we actually need a 200b+ parameter model to do this or if we can distill the functionality onto a much smaller, more economical model. A lot of people are already choosing to do it with Gemeni (due to the 1m context window), and they write the code with Claude 4 Sonnet.
Ideally, we want to be able to run this process cheaply in parallel to get really fast generations. That's the ultimate goal we're aiming towards
Cline orchestrates all the models under the hood, you could use our apply model with Cline. Not sure what model they are using for that feature right now
We trained it on over a dozen languages, with a bias towards Typescript and Python. We've seen it work on Markdown pretty well, but you could try it on plaintext too -- curious to hear how that goes
Open source git repos are a really good place to get data -- it requires a lot of munging to get it into a useful format, but that's the name of the game with model training.
It's on the roadmap to make public evals people can use to compare their options. A lot of the current benchmarks aren't really specialized for these prompt-to-app use cases
Hey, really appreciate the detailed sign up journey here! Getting the simplest flow is hard, and it's something we obsess over. The docs have been a work in progress for the past couple of months, but now that they are getting better I think it's a good idea to make them more front and center for new users.
We are trying to make this as accessible as possible to the open-source community, with our free tier, but feel free to reach out if you need expanded rate limits. Cheers :)