DeepSeek cost just over $5M to train. StarCoder cost around $1M, there is no inf...

pyman · 2025-06-03T21:51:20 1748987480

These architectures didn't exist last year. The Chinese are innovating thanks to massive government backing, access to talent, and a clear focus on winning the AI race.

ricardobeat · 2025-06-03T22:17:44 1748989064

Starcoder was released in 2023, by french/american companies, and there were other coding models before it.

That was right around the time this company had a new $250M funding round, so lack of resources to invest in actual AI is a terrible excuse.

pyman · 2025-06-03T22:44:09 1748990649

StarCoder is an open-source LLM for code, not text. Builder.ai told investors they were building a virtual assistant called "Natasha", not a code assistant.

I'm just telling you what I read online. Builder.ai wasn't competing with GitHub Copilot, Cody, or CodeWhisperer. Those are code assistants for developers. Builder.ai was building a virtual assistant for customers. It meant to "talk" to clients, gather requirements and automate parts of the build process. Very different space.

And like I said before, creating a dedicated, pre-trained foundation model is expensive. Not to mention a full LLM.

ricardobeat · 2025-06-04T16:04:35 1749053075

The point is that one doesn't need billions to create useful models — there is no reason they would need new foundation models in the first place — and that 'running out of money' is unlikely to have been their main problem, 250M should be more than enough to create an AI website builder.

pyman · 2025-06-04T18:04:41 1749060281

I get what you are saying, but you're missing the most important piece of the puzzle: the data.

Everyone talks about models and infrastructure, but without the right data, you've got nothing. And that's where the biggest hidden cost is.

According to the company's own website, they were creating the data themselves. Labelled datasets, transcripts, customer conversations, documents, and more. That's where the real money goes. Millions and millions.