Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I'd also pay for something in the 10-15b parameter range that used more limited training data focused almost entirely on programming documentation and books along with professional business writing.

Unfortunately, pretraining on a lot of data (~everything they can get their hands on) is needed to give current LLMs their "intelligence" (for whatever definition of intelligence). Using less training data doesn't work as well for now. There definitely not enough programming and business writing to train a good model only on that.



If the LLM isn’t getting its data about coding projects from those projects and their surrounding documentation and tutorials, what is it going to train with?

Maybe it also needs some amount of other training data for basic speech patterns, but I’d again show IBM Granite as an example that professional and to-the-point LLMs are possible.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: