> Build your own foundation model for coding. tbh I'm skeptical that a company can do this better than the Big 3 AI cos.
They already have their own "tab" model which might not be a very large one but definitely better than most open weight models on short code snippet completion. And for larger agentic LLMs, they can totally start from a pre-trained base model (e.g. deepseek R1) and only do post-train/RL/finetune, which is doable with a small team and their cash reserve.
It's not hard to imagine a good base model (probably deepseek V4?) + cursor's user data leads to a model that surpass sonnet/gemini on coding tasks only.
They already have their own "tab" model which might not be a very large one but definitely better than most open weight models on short code snippet completion. And for larger agentic LLMs, they can totally start from a pre-trained base model (e.g. deepseek R1) and only do post-train/RL/finetune, which is doable with a small team and their cash reserve. It's not hard to imagine a good base model (probably deepseek V4?) + cursor's user data leads to a model that surpass sonnet/gemini on coding tasks only.