so you're pulling one line from the article to tell me that i'm wrong?
on-disk format/write amplification:
> For tables with a large number of secondary indexes, these superfluous steps can cause enormous inefficiencies. For instance, if we have a table with a dozen indexes defined on it, an update to a field that is only covered by a single index must be propagated into all 12 indexes to reflect the ctid for the new row.
How wide is their data? Depending on the answer to this, it could be that they've over-indexed, have a poor indexing strategy, or are reacting to the poor queries generated by an ORM (not sure if they use one, or if they hand-code their own SQL).
i'm not here to debate whether postgres is better than mysql. i'm just saying that it seems like a lot of research went into justifying a switch. who knows, maybe that research could have been spent optimizing their current environment.
>so you're pulling one line from the article to tell me that i'm wrong?
Better than telling them that they are wrong while not only not pulling even one line from their article, but misattributing it to be something very different from what it is.
In what world does the response to the concerns and analysis in the article can ever be: "data corruption: everyone has bugs"...
the same world where mysql lets you corrupt your own data. did you just stop reading? in what world does a data corruption event prompt you to change platforms to another platform that has a history of data corruption?
They do address other stuff. They have huge writes and needs better writes performance. Maybe not what you and I need, but hey, I suppose they know Uber needs better?
MySQL handles it differently than Postgres, and gives them better performance for their purpose (based on their experience/test). They were explaining it in the parlance of the terms that MySQL and Postgres. If those are buzzwords, then MySQL and Postgres are both created using buzzwords?
Their explanation is not perfect (for me, why do their datamodel needs massive updates?). But I wouldn't write it off as buzzwords and dismissing Postgres because of data corruption. There are a lot of other things they were trying to explain there.
It is. But at least MySQL provides both ways of shipping changes, WAL-shipping like row-based replication and the less reliable statement-based replication, and the DBA can choose which to use when.
Combining that fact with the way Postgres and InnoDB handle secondary indexes means WAL-shipping not only ships the entire new row, but also all the disk blocks of all the secondary index updates, unlike MySQL's row-based replication.
This is actually something I greatly like and want to see in Postgres. Perhaps the decision to choose secondary indexes with direct links to rows was taken because at that time (before replication) it was less read-heavy and the write-amplification wasn't such a concern. But now, when replication is a common requirement (and network IO is not as fast as disk IO) it makes a lot of sense to switch to a single-point-of-update allowing way of storing secondary indexes
> [...] This design difference means that the MySQL replication binary log is significantly more compact than the PostgreSQL WAL stream.
Doesn't sound like what you described at all.