*> But we had less than a terabyte of data* I really wonder how an in-place `pg_...

stopachka · 2025-01-30T15:10:09 1738249809

> 30 minutes

The Lyft team reported 30 minutes for their 30TB database. Our db took about 15 minutes. In the essay we wrote:

> So we cloned our production database and tested an in-place upgrade. Even with our smaller size, it took about 15 minutes for the clone to come back online.

nijave · 2025-01-30T12:48:24 1738241304

I don't think pg_upgrade takes the whole time. Some of it is overhead of AWS managed database service where it's creating a snapshot before and after, applying new config, spinning for no apparent reason

timacles · 2025-01-30T18:11:52 1738260712

Yeah we just did it with the --link option on a 6TB database and it took like 30 seconds. Something has to be off with their OS settings or disk speeds.

The main challenge with that is running an ANALYZE on all the tables though, that took like 30 minutes during which time the DB was unusable

pilif · 2025-01-31T04:40:26 1738298426

These days it does analyze in stages where it does multiple passes with increasing stats sampling.

From personal experience, most of the queries become useable after the first stage has completed which on my 8TB database took less than 5 minutes

timacles · 2025-01-31T14:20:40 1738333240

We did use the --analyze-in-stages option, I think our data model is just not optimal. We have a lot of high frequency queries hitting very large tables of .5 to 1 billion rows. Proper indexing makes them fast but until all the stats are there, the frontend is unusable.

zonkd1234 · 2025-01-30T23:28:06 1738279686

Was it unusable because cpu/io was maxed out during ANALYZE?

pilif · 2025-01-31T04:41:38 1738298498

Analyze itself isn’t the problem.

After pg_upgrade, no stats will be available for the optimizer which means that any query will more or less sequence-scan all affected tables.

timacles · 2025-01-31T14:14:42 1738332882

No, its the lack of stats on the tables, any query hitting a medium to large table would be extremely slow.