Hacker News new | past | comments | ask | show | jobs | submit login

> But we had less than a terabyte of data

I really wonder how an in-place `pg_upgrade` of such small amounts of data would take 30+ minutes.

My experience from a less mission-critical situation where 5 minutes of maintenance are absolutely acceptable is that an in-place `pg_upgrade` with `--link` of a 8 TB database takes less than a minute and will not accidentally lose data or fail to properly configure schema search paths or whatever other mess the article was talking about.

I understand that 30 minutes of downtime are not acceptable. But if it's 5 minutes or less, I would seriously consider an offline upgrade using `pg_upgrade`

And if it takes 30 minutes to hard-link less than 1 TB of data files, you should seriously consider changing hosts because that's absolutely unacceptable performance.




> 30 minutes

The Lyft team reported 30 minutes for their 30TB database. Our db took about 15 minutes. In the essay we wrote:

> So we cloned our production database and tested an in-place upgrade. Even with our smaller size, it took about 15 minutes for the clone to come back online.


I don't think pg_upgrade takes the whole time. Some of it is overhead of AWS managed database service where it's creating a snapshot before and after, applying new config, spinning for no apparent reason


Yeah we just did it with the --link option on a 6TB database and it took like 30 seconds. Something has to be off with their OS settings or disk speeds.

The main challenge with that is running an ANALYZE on all the tables though, that took like 30 minutes during which time the DB was unusable


These days it does analyze in stages where it does multiple passes with increasing stats sampling.

From personal experience, most of the queries become useable after the first stage has completed which on my 8TB database took less than 5 minutes


We did use the --analyze-in-stages option, I think our data model is just not optimal. We have a lot of high frequency queries hitting very large tables of .5 to 1 billion rows. Proper indexing makes them fast but until all the stats are there, the frontend is unusable.


Was it unusable because cpu/io was maxed out during ANALYZE?


Analyze itself isn’t the problem.

After pg_upgrade, no stats will be available for the optimizer which means that any query will more or less sequence-scan all affected tables.


No, its the lack of stats on the tables, any query hitting a medium to large table would be extremely slow.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: