Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is a "small language model", and then there is a "small LARGE language model". In late 2018, BERT (110 million params) would've been considered a "large" language model. A "small" LM would be some markov chain or a topic model (e.g. latent dirichlet allocation) - technically they would be considered generative language models since they learn joint distributions of params and data (words), and can then sample from that distribution. But today, we usually map "small" LMs to "small" LLMs, so in that sense a small LLM would be anything from BERT to around 3-4B params.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: