Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The cheapest (free or cash strapped services) will implement several (hidden/opaque) ways to reduce the cost of answering a query by limiting the depth and breadth of its analysis.

This already happens. Many of the cheap API providers aggressively quantize the weights and KV cache without making clear that they do.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: