The problem with this is no one can agree about what "at scale" means.
Like yes, everyone knows that if you want to index the whole internet and have tens of thousands of searches a second there are unique challenges and you need some crazy complexity. But if you have a system that has 10 transactions a second...you probably don't. The simple thing will probably work just fine. And the vast majority of systems will never get that busy.
Computers are fast now! One powerful server (with a second powerful server, just in case) can do a lot.
Yeah, we do 100k ML inferences per second. It's not a single server, but the architecture isn't much more complicated than that.
With today's computers, indexing the entire internet and serving 100k QPS also isn't really that demanding architecturally. The vast majority of current implementation complexity exists for reasons other than necessity.
Yep, vertical scaling goes a long way. But it’s not compute where the bottleneck for scale lies, rather in the resiliency & availability.
So although a single server goes a long way, to hit that sweet 99.999 SLA, people horizontally scales way before hitting the maximum compute capacity of a singe machine. HA makes everything way more difficult to operate and reason about.
Like yes, everyone knows that if you want to index the whole internet and have tens of thousands of searches a second there are unique challenges and you need some crazy complexity. But if you have a system that has 10 transactions a second...you probably don't. The simple thing will probably work just fine. And the vast majority of systems will never get that busy.
Computers are fast now! One powerful server (with a second powerful server, just in case) can do a lot.