Now I really want to know though.

cperciva · 2025-06-06T22:00:39 1749247239

My understanding is that EBS has some heuristics for deciding whether to keep data cached; an AMI which has a cached snapshot as its root disk will boot much faster than an AMI where all the data needs to be pulled from S3.

tedunangst · 2025-06-06T22:14:32 1749248072

Some huge customer chunked their data into 5GB pieces so now there's a "if size == 5GB" in the cache code.

cperciva · 2025-06-06T22:17:28 1749248248

Maybe, but I don't think that would explain 8 GB also being fast while 6 GB is slow?

MobiusHorizons · 2025-06-07T16:49:18 1749314958

Yeah, I found that pretty unintuitive when I read it. How did you find 8GB worked? Trial and error?

0x457 · 2025-06-06T23:01:36 1749250896

Customer started using 8GB chunks /s

JoshTriplett · 2025-06-07T00:26:21 1749255981

What's the smallest size for which those heuristics keep the snapshot cached?

(I'm currently using 1GB snapshots, because my actual disk image is a tiny fraction of that size. But if bumping that to 2GB or 4GB would make it faster, that's a small price to pay.)

cperciva · 2025-06-07T00:27:23 1749256043

I believe 1 GB is also fast.

JoshTriplett · 2025-06-07T00:40:07 1749256807

Thanks, that helps to hear!

Do you have any other wisdom regarding mysterious reasons for fast or slow booting? EC2's boot process is deeply opaque, and any insight at all is better than nothing.

cperciva · 2025-06-07T01:11:24 1749258684

Nothing comes to mind, but if you want to drop me an email I can walk you through some benchmarking.

richardwhiuk · 2025-06-07T19:35:30 1749324930

At a guess, powers of 2 are fast?

cperciva · 2025-06-07T19:38:21 1749325101

5 is not a power of 2. ;-)

messe · 2025-06-07T22:13:19 1749334399

Gotta admit it's pretty close though.

selimnairb · 2025-06-07T11:53:18 1749297198

Yeah, I am constantly curious about how the sausage that is cloud services like AWS is made. It seems generally slick on the surface, but what’s holding it all together? I imagine it as a tangled ball of tools like Puppet, Chef, etc. and custom glue.

arcfour · 2025-06-07T13:17:55 1749302275

A lot of AWS services are built on other AWS services. Like Lambda, SQS, and other such "core services" are used by others under the hood.

akdev1l · 2025-06-07T12:09:21 1749298161

At Amazon scale mostly everything is custom

Less puppet/chef

selimnairb · 2025-06-07T13:40:06 1749303606

Yeah, I would imagine they maybe started with off-the-shelf tools that were then gradually replaced as the system grew and matured.

akdev1l · 2025-06-12T03:33:19 1749699199

Kind of the opposite, I think AWS was the first hyper scaler so tooling did not exist for many of these problems back then

Like they have their own custom clustering software where you would probably use k8s if you were to rebuild things today

Repeat this over a million different tools, etc

This article is interesting if you want to take a peek behind the curtain: https://www.allthingsdistributed.com/2014/11/apollo-amazon-d...