What Write Skew Looks Like

ztorkelson · on May 24, 2018

Well written and detailed comparison of snapshot and serializable isolation levels.

From the article:

A more concrete reason than “nothing else makes sense” is a question of local vs. global reasoning. If a set of transactions must maintain some kind of invariant within the database (for instance, the sum of some set of fields is always greater than zero). In a database that guarantees serializability, it’s sufficient to verify that every individual transaction maintains this invariant on its own. With anything less than serializability, including Snapshot, one must consider the interactions between every transaction to ensure said invariants are upheld. This is a significant increase in the amount of work that must be done (though in reality, I think the situation is that people simply don’t do it), a point made by Alan Fekete in this talk[1] on isolation.

This bears repeating. In any non-trivial transactional application, the combination of concurrency and non-serializable transaction isolation levels is a serious bug farm. Most app/service backends fall for this trap.

Unfortunately, most OLTP database systems have incredibly poor implementations of the serializable isolation level. Postgres is a notable exception, and it's nice to see some newer database systems--like CockroachDB--making progress in this area.

[1] https://www.youtube.com/watch?v=IP-S_RHlsEQ (Thanks for the link; I hadn't seen this talk before.)

hyperpape · on May 24, 2018

There's a quote that's gone around by Peter Bailis:

"Despite the ubiquity of weak isolation, I haven’t found a database architect, researcher, or user who’s been able to offer an explanation of when, and, probably more importantly, why isolation models such as Read Committed are sufficient for correct execution. It’s reasonably well known that these weak isolation models represent “ACID in practice,” but I don’t think we have any real understanding of how so many applications are seemingly (!?) okay running under them."

http://www.bailis.org/blog/understanding-weak-isolation-is-a...

ztorkelson · on May 25, 2018

Thanks for the citation root!

I agree, in spirit, with all three of his hypotheses for why the situation doesn't seem to be as dire one might otherwise expect. Peter writes that "it's possible data is actually corrupted, and apps don't care". That's true, of course, but it's also that the corruption is not always prominently visible.

Historically, improper concurrency control (e.g. in multithreaded programs written in unsafe languages) could quickly result in much more than logical invariant violations. Physical data corruption and memory access violations were (and still are) prevalent, with readily apparent results.

Modern database systems, however, aren't going to crash when you hit a serialization anomaly. They're not going to physically corrupt your data: your numbers will still be numbers, your strings will still be strings, and your foreign keys will still be sound.

We do occasionally bear witness to prominent failures that stem from weak database isolation levels: "oh, look at that cryptocurrency exchange backed by [database du jour], Eve was able to overdraw her account by issuing a bunch of concurrent requests. What amateurs!" and we all point and laugh from our houses of glass.

In truth, it's not the anomaly that's abnormal, it's the visibility. Those kinds of anomalies happen all the time, even to experienced practitioners, but the failure modes will generally be much more obscure and application-specific. Most aren't going to manifest as some exception in your logs, let alone as a top post on HN. You'll only find out about the problem when a ticket comes in saying "hey, the summary and itemized reports don't add up" or "this section is overflowing because the user has 6 email addresses and we told the designers that the application limits it to 5". That's how serialization anomalies usually manifest, and it's a death-by-a-thousand-cuts situation, because each occurrence saps precious time from everyone involved.

We have to do better.

foldU · on May 25, 2018

Agreed! Peter has actually done some really great research into this, finding real vulnerabilities in applications due to insufficient transaction isolation: http://www.bailis.org/papers/acidrain-sigmod2017.pdf

trhway · on May 24, 2018

>If a set of transactions must maintain some kind of invariant within the database (for instance, the sum of some set of fields is always greater than zero). In a database that guarantees serializability, it’s sufficient to verify that every individual transaction maintains this invariant on its own. With anything less than serializability, including Snapshot, one must consider the interactions between every transaction to ensure said invariants are upheld.

this is a difference between practice and theory. The invariant mentioned above is of theoretical interest. On practice such invariant would look like "sum of some fields is always equal to the value in that field of that row". (left as an exercise to the reader to see that that former theoretical invariant, and actually any other, can, without loss of generality, always be implemented as the practical invariant with the target field :) As basic Read Committed still means serialization of write access to that "sum" target field, it is enough to verify only that each transaction writing to any of those fields is also writing to that "sum" field and that it does uphold the invariant. There is no need to pay the huge performance price of serializing those transactions with all the rest of transactions in the system. This is why Read Committed is veriafiably enough for correct execution of so many applications (i.e. for any application whose invariants are implemented in that "practical" form). It is basically like multi-thread programming with correctly implemented synchronization of access to shared data.

voidmain · on May 25, 2018

> it is enough to verify only that each transaction writing to any of those fields is also writing to that "sum" field and that it does uphold the invariant

This is a good try at a theory of when you can use weak isolation. But as stated, I believe it is false. If the invariant is A+B=C, I think this sequence is permitted by READ COMMITTED:

T1: read A=1

T2: read B=1

T2: write A=2

T2: write C=3

T2: commit

T1: write B=10

T1: write C=11

T1: commit

Now A=2, B=10, C=11

More to the point, invariants aren't always local to a row, and people don't usually even fully articulate them.

> There is no need to pay the huge performance price of serializing those transactions with all the rest of transactions in the system

Serializable execution can be made very very fast when there aren't actually a lot of conflicts. And then when there are you can do the hard work of trying to prove that something weaker will work. The failure mode in the other direction is silent corruption of your data.

trhway · on May 25, 2018

What i was talking about in its most simple (though not most performant/scalable as it would be more complicated, so we wouldn't go into it now) implementation would look like this - the transactions involved with an invariant should start with locking of the "target" field of the invariant, the field C in your example:

T1: lock C

T2: attempt to lock C - blocked until T1 completes.

I.e. we have these 2 transactions serialized between themselves without unnecessary extra serialization with all the rest of the transactions in the system. The "target" field just plays the role of the mutex associated with the invariant (and this is why any invariant can be converted to that form just by having a mutex, ie. a "target" field, associated with it). From the theory of multi-threaded programming with mutexes we just know that everything can be implemented correctly in such a model. I.e. any application can be made correctly with READ COMMITTED isolation if required set of "mutexes" and their acquire/release is correctly implemented.

>Serializable execution can be made very very fast when there aren't actually a lot of conflicts.

not exactly. Serializability by itself is pretty expensive global invariant and your application would be paying for it in performance and scalability even though it may not really need it.

voidmain · on May 25, 2018

> any application can be made correctly with READ COMMITTED isolation if required set of "mutexes" and their acquire/release is correctly implemented

This is true, but it's "a big if." And the result might be faster than a good implementation of serializable isolation (since the latter is working to maintain some invariants that you don't actually care about), but it could easily be much slower: very broad invariants will require very broad locks which cause much more "serialization" than is required for serializability.

Let's say you want the same sum invariant but across rows: all users' balances must sum to zero. The straightforward application of your reduction is to essentially have a global mutex for all balance transactions, but this is nowhere near the most scalable solution. Whereas a good serializable database will handle this situation scalably, safely, and automatically.

ztorkelson · on May 25, 2018

In my experience, many useful invariants cannot be trivially expressed in a singular "target field" against which one may lock, but that's actually beside the point.

The point is that--insofar as it comes to correctness--serializability allows local reasoning, whereas weaker isolation levels require global reasoning. You can't argue against that by saying "well, if every transaction just explicitly locks the appropriate records at the appropriate times …" because that's exactly the type of global reasoning that we're trying to avoid.

There certainly is a difference between theory and practice here, but I don't think it's in favor of weaker isolation levels.

In theory, as you note, one can enforce any invariant–even in Read Committed–by taking a global lock in every transaction. In practice, nobody does that, because it takes concurrency to zero.

In theory, one can regain concurrency by judiciously breaking that global lock into some application-specific hierarchy of locks, and by having the discipline to adhere to that bespoke locking protocol for every transaction, now and in perpetuity. In practice, that is both too expensive and too error prone to be applied successfully in any non-trivial system (though some do still try).

In theory, these concerns can be addressed at the application level. In practice, it is grossly irresponsible that we (as an industry) continue to foist these concerns on application developers who–in the aggregate–cannot reasonably be expected to get it right. The number of viable database systems is small and growing slowly, while the number of database-backed applications is large and growing rapidly. To have any hope of improving the status quo, well: I know where I'd put my money.

trhway · on May 25, 2018

>In my experience, many useful invariants cannot be trivially expressed in a singular "target field" against which one may lock

just trivially dedicate/associate a field with each invariant :)

>The point is that--insofar as it comes to correctness--serializability allows local reasoning, whereas weaker isolation levels require global reasoning. You can't argue against that by saying "well, if every transaction just explicitly locks the appropriate records at the appropriate times …" because that's exactly the type of global reasoning that we're trying to avoid.

The reasoning required is the same in both cases. To avoid serialization conflicts you still have to identify all those appropriate records and appropriate times in your reasoning. So the choice is either to explicitly implement required locking for RC or let the SERIALIZABLE do it implicitly under the hood (and hoping that it will do it better :).

>one can enforce any invariant–even in Read Committed–by taking a global lock in every transaction. In practice, nobody does that, because it takes concurrency to zero.

in naive (on practice of course it is more complex and better as result) implementation of what i said the worst case concurrency is the number of disjoint set of invariants. And SERIALIZABLE would do about the same concurrency or worse.

>In practice, that is both too expensive and too error prone to be applied successfully in any non-trivial system (though some do still try).

in my experience any non-trivial system wouldn't run in SERIALIZABLE without hitting the conflicts until the system gets subjected a lot to that "global reasoning" that you mentioned. And flushing all those cases is very time and effort consuming because it is "global" reasoning about implicit locking and relations. The explicit lock based model for RC is more simpler mode of thinking and thus much more practicable and trackable. And this is why we all run RC. At least it runs and if some reasoning was put into it - it runs correctly, fast and scalable.

ztorkelson · on May 25, 2018

The reasoning required is the same in both cases. To avoid serialization conflicts you still have to identify all those appropriate records and appropriate times in your reasoning.

Well, sort of. Serializability allows you to reason locally for correctness, but not for performance. That is, however, a meaningful distinction.

Serializability remains a marked improvement over the status quo. Most operations aren't actually performance sensitive, so that tradeoff makes perfect sense. With serializability, we need only focus our efforts on those operations which are both performance sensitive and highly contended.

This has nice emergent properties, because the problematic transactions are immediately apparent: they are the slow ones, the ones that are not meeting their performance budget.

Performance anomalies are directly measurable and immediately visible. Correctness anomalies are generally neither.

Performance anomalies are generally localized and ephemeral. Correctness anomalies are often persistent and viral.

So the choice is either to explicitly implement required locking for RC or let the SERIALIZABLE do it implicitly under the hood (and hoping that it will do it better :).

Again, when it comes to correctness, there's no real hope to do better than serializable. When it comes to performance: sure. But in every application domain I've worked, getting to a wrong result faster is rarely useful. (I understand that this is not a universal truth, but I believe it is significantly more common than the alternative. The adage "first make it work, then make it fast" comes to mind.)

in my experience any non-trivial system wouldn't run in SERIALIZABLE without hitting the conflicts until the system gets subjected a lot to that "global reasoning" that you mentioned.

I agree that there have been (and still are) practical impediments to deploying large-scale serializable systems with incumbent technologies. But I disagree that those impediments are inherent to all implementations of the serializable isolation level, which is why these kinds of articles (and the R&D behind them) are so important.

We still have a long way to go before serializable-by-default is a tenable option for the most demanding of systems, but that is what we should be working towards.

arjunnarayan · on May 24, 2018

> Postgres is a notable exception

Did you have good experiences with postgres' serializable mode? When I tried to do some TPC-C benchmarking with postgres set explicitly in serializable mode, it would fall over almost instantly (read: not able to get beyond ~100 warehouses). I'd love to read anything you have to say about getting good performance out of Postgres in serializable mode, because I was unable to find this promised land myself.

drkp · on May 25, 2018

I have some experience with that specific question, although it was a few years ago...

Do you remember what conflict was causing things to fall over?

In general, the usual important parameters to tweak are:

- max_pred_locks_per_transaction may need to be increased; otherwise locks will switch to coarse granularity to save room in the lock table

- for tables that fit in memory, the planner may choose a sequential scan even when an index scan is available, which can be faster but creates more conflicts on a serializable workload. Increasing cpu_tuple_cost should avoid that (or even just enable_seqscan=off to force indexes whenever available)

ztorkelson · on May 25, 2018

I actually don't have much hands-on experience with Postgres, so I can't speak to how it performs in practice.

My comment was highlighting that Postgres' implementation of SSI is at least a better starting point than most other purely pessimistic implementations of the serializable isolation level. It does not surprise me, however, to hear that there are still performance deficiencies in practice. For example, the Postgres implementation is imprecise, which will result in spurious transaction aborts (those will presumably be retried, but it comes at a cost to latency and throughput). And even though reads are optimistic (which will perform better than pessimistic concurrency control, if contention rates are sufficiently low), maintaining transaction dependency metadata can still have a non-negligible cost.

I also think Postgres suffers from other more dubious choices--like a lack of clustered indexes--which can exacerbate things.

My work in this area has focused on revisiting various architectural and implementation design decisions which have contributed to the current state of affairs. I think the relational data model is generally the right choice, but most relational databases have some godawful characteristics around performance, scalability, reliability, and programmability which limit their efficacy in practice.

sriram_malhar · on May 25, 2018

I like Jim Gray's simple explanation (as always) of the difference.

A database has two white marbles and two black marbles. Transaction T1: Update all white to black. Transaction T2: Update all black to white.

A serializable database will ensure that the transactions happen in some serial order. At the end of performing them one after another, either all marbles are white or all black.

In snapshot isolation, T1 updates only the white marbles, T2 updates only the black marbles. So they are not seen as conflicting. The end result is that the database contains 2 black (formerly white) and 2 white marbles (formerly black),because the two transactions are working off a snapshot and not conflicting.

jchrisa · on May 24, 2018

If you are interested in a similar discussion of consistency levels, write skew, and how FaunaDB allows indexes to opt-in to serializability, this blog post covers a lot of ground, but brings up index write skew toward the end: https://blog.fauna.com/acid-transactions-in-a-globally-distr...

dang · on May 24, 2018

Url changed from https://www.cockroachlabs.com/blog/what-write-skew-looks-lik..., which points to this.

zokier · on May 24, 2018

Pictures are now 404 for me @justinjaffray.com, so while the crdb link might not be canonical it right now is bit more useful.

foldU · on May 24, 2018

Sorry, that was my fault! Should be fixed now.