Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Something I don't enjoy about remote/distributed locks is that unlike distributed transactions they're usually unable to provide any strict guarantees about things they protect.

E.g. if you algorithm is:

1) Hold the distributed lock

2) Do the thing

3) Release the lock

And the node goes dark for a while between steps 1 and 2 (e.g. 100% CPU load), by the time it reaches 2 the lock may have already expired and another node is holding it, resulting in a race. Adding steps like "1.1 double/triple check the lock is still held" obviously doesn't help because the node can go dark right after these and resume operation at 2. The probability of these is not too high, but still: no guarantees. Furthermore at a certain scale you do actually start seeing rogue nodes deemed dead hours ago suddenly coming back to life and doing unpleasant things.

The rule of thumb usually is "keep locks within the same transaction space as the thing they protect", and often you don't even needs locks in that case, just transactions can be enough by themselves. If you're trying to protect something that inherently un-transactional then, well, good luck because these efforts are always probabilistic in nature.

A good use-case for a remote lock would be when it's not actually used to guarantee consistency or avoid races, but merely tries to prevent duplicate calculations for cost/performance considerations. For all other cases I outright recommend avoiding them.



A lot of what you say is explained in detail in Martin Kleppmann's article[0]. As you said, there's no guarantee about when the lock will expire. The proper solution for this is a fencing token. The idea is similar to how people have used optimistic locking when updating data in a db to avoid two users overwriting other's work.

[0]: https://martin.kleppmann.com/2016/02/08/how-to-do-distribute...


Yes, exactly! We found out the hard way just how unreliable Redis-based locks are, and switched to Postgres locks. It works reliably since our code is already in a Postgres transaction.

Created a “lock” table with a single string key column, so you can “select key for update” on an arbitrary string key (similar UX to redis lock). I looked at advisory locks, but they don’t work when the lock key needs to be dynamically generated.


After reading the current[1] top comment about Redlock, this was literally the next low-effort thing that came to mind, so I'm glad to find some else's experiences with using a PostgreSQL table as a lock.

I will need a distributed lock soon, but I've never used one before so I'm taking this chance to learn about them.

[1]: https://news.ycombinator.com/item?id=41315621


If it goes dark a microsecond after #3 you might have an ambiguous success. Transaction processed but you didn't get a confirmation.

A lot of robust systems end up implementing their own bespoke WAL semantics on top of the system of record. It's like we should have a formal solution for doing that by now.


We do we have globally distributed ACID DBs like Spanner, CockroachDB, FoundationDB etc.


True. Even simple scenarios like "save a file in s3 IFF the s3 link is saved in postgres" which are seen in virtually any application are rarely handled well.


Uggh, I was cornered into writing a couple of these over the years. The way I handled it was:

1. make sure both operations will be retried if they don't run to completion, and

2. think through how the rest of the system would react to one of them being present without the other

Then I use whichever of the two orderings is less bad from the perspective of #2. Obviously this depends on the exact use case -- I was simply lucky that the rest of the system was designed in such a way that it could tolerate that bad intermediate state.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: