> The cheapest (free or cash strapped services) will implement several (hidden/opaque) ways to reduce the cost of answering a query by limiting the depth and breadth of its analysis.
This already happens. Many of the cheap API providers aggressively quantize the weights and KV cache without making clear that they do.
This already happens. Many of the cheap API providers aggressively quantize the weights and KV cache without making clear that they do.