The problem with this is no one can agree about what "at scale" means. Like yes,...

hansvm · 2025-08-29T23:38:05 1756510685

Yeah, we do 100k ML inferences per second. It's not a single server, but the architecture isn't much more complicated than that.

With today's computers, indexing the entire internet and serving 100k QPS also isn't really that demanding architecturally. The vast majority of current implementation complexity exists for reasons other than necessity.

rednafi · 2025-08-29T22:49:14 1756507754

Yep, vertical scaling goes a long way. But it’s not compute where the bottleneck for scale lies, rather in the resiliency & availability.

So although a single server goes a long way, to hit that sweet 99.999 SLA, people horizontally scales way before hitting the maximum compute capacity of a singe machine. HA makes everything way more difficult to operate and reason about.