It's not mentioned in the headline and not made super clear in the article: This...

ants_a · 2025-04-30T12:35:24 1746016524

Interesting why this magic would be needed. Vanilla Postgres does support quorum commit which can do this. You can also set up the equivalent multi-AZ cluster with Patroni, and (modulo bugs) it does the necessary coordination to make sure to promote primaries in a way that does not lose transactions or makes visible a transaction that is not durable.

There still is a Postgres deficiency that makes something similar to this pattern possible. Non-replicated transactions where the client goes away mid-commit become visible immediately. So in the example, if T1 happens on a partitioned leader, disconnects during commit, T2 also happens on a partitioned node, and T3 and T4 happen later on a new leader, you would also see the same result. However, this does not jive with the statement that fault injection was not done in this test.

Edit: did not notice the post that this pattern can be explained by inconsistent commit order on replica and primary. Kind of embarrassing given I've done a talk proposing how to fix that.

sontek · 2025-04-30T13:10:02 1746018602

Link the talk video

ants_a · 2025-04-30T20:24:11 1746044651

https://www.youtube.com/watch?v=vz-dhwSpjOw

ashu1461 · 2025-04-30T03:01:32 1745982092

Have one question

So if snapshot violation is happening inside Multi-AZ instances, it can happen with a single region - multiple read replica kind of setup as well ? But it might be easily observable in Multi-AZ setups because the lag is high ?

luhn · 2025-04-30T04:53:00 1745988780

A synchronous replica via WAL shipping is a well-worn feature of Postgres. I’d expect RDS to be using that feature behind the scenes and would be extremely surprised if that has consistency bugs.

Two replicas in a “semi synchronous” configuration, as AWS calls it, is to my knowledge not available in base Postgres. AWS must be using some bespoke replication strategy, which would have different bugs than synchronous replication and is less battle-tested.

But as nobody except AWS knows the implementation details of RDS, this is all idle speculation that doesn’t mean much.

wb14123 · 2025-04-30T08:40:58 1746002458

This kind of replication can be configured in vanilla Postgres with something like ANY 3 (s1, s2, s3, s4) in synchronous_standby_names? Doc: https://www.postgresql.org/docs/current/runtime-config-repli...

ctapobep · 2025-04-30T11:32:27 1746012747

I don't think it's possible with ANY set up. All you get is that some replicas are more outdated than others. But they won't return 2 conflicting states when ReplicaA says tx1 wrote (but not tx2), while ReplicaB says tx2 wrote (but not tx1). Which is what Long Fork and Parallel Snapshot are about.

So Amazon Multi-cluster seems to replicate changes out of order?

mattashii · 2025-04-30T11:47:33 1746013653

Kinda. I think it's "just" PostgreSQL behaviour that's to blame here: On replicas, transaction commit visibility order is determined by the order of WAL records; on the primary it's based on when the backend that wrote the transaction notices that its transaction is sufficiently persisted.

See also my comment https://news.ycombinator.com/item?id=43843790 elsewhere in this thread

x0x0 · 2025-04-30T18:07:07 1746036427

it's the 2nd sentence in the article:

> We show that Amazon RDS for PostgreSQL multi-AZ clusters violate Snapshot Isolation

you kind of have to expect people to read

evil-olive · 2025-04-30T18:18:35 1746037115

I think it's still an important clarification, because for years you've had a choice in RDS (classic RDS, not Aurora) between "single-AZ" and "multi-AZ" instances, with the general rule of thumb that production workloads should always be multi-AZ.

however, "multi-AZ" has been made ambiguous, because there are now multi-AZ instances and multi-AZ clusters.

...and your multi-AZ "instance", despite being not a multi-AZ "cluster" from AWS's perspective, is still two nodes that are "clustered" together and treated as one logical database from the client connection perspective.

see [0] and scroll down to the "availability and durability" screenshot for an example.

0: https://aws.amazon.com/blogs/aws/amazon-rds-multi-az-db-clus...