Efrim-Lipkin's comments

Efrim-Lipkin · on March 15, 2017

Hi guys. Can I ask a simple question?

I understand that we are talking about a globally distributed, serverless and yet consistent relational database.

My question is about latency. How long does it take for transactional atomicity to become a consistent read on a globally distributed database? (1) And what are the measures taken between entry nodes to prevent clients from recieving inconsistent data? (2)

As I ponder this, I am struck by not the consistency problem, as that is solvable. But I am struck by the latency problem of assuring that all global queries are consistent for some (any) time quanta. What sort of latency should be expected?

both questions (1) and (2) are interesting, but (1) is critical while (2) is academic.

Thanks, and very interesting work guys.

EL

freels · on March 15, 2017

FaunaDB has a per-database replicated transaction log. Once a transaction has been globally committed to the log, it is applied to each local partition that covers part of the transaction. By this point, the transaction's order with respect to others in a database and results are determined. While writes require global coordination to commit, reads across partitions are coordinated via a snapshot time per query, which guarantees correctness.

In short, writes require a global round-trip through the transaction pipeline; reads are local and low latency.

Efrim-Lipkin · on March 15, 2017

This is a very good answer. So if I understand you correctly (please correct if I do not), atomicity is handled on a per connection basis (writes cannot be distrubted). And there may be high latency in distributing a transaction, but read consistency is guaranteed by timestamp (equivalent to versioning).

Is this correct?

EL

freels · on March 16, 2017

One thing that makes this easier is that FaunaDB does not support session transactions, rather you must express your transaction logic as a single Fauna query, which is executed atomically. Transactions can still involve arbitrary keys, however.

And yes, for reads, by default the coordinating node chooses a timestamp and uses that to query all involved data partitions. Each partition will respond with the requested data as of at that timestamp, or will delay responding until it has caught up.

One nice thing about this approach is that any chosen timestamp is enough to provide a consistent snapshot view of the dataset at that time. This ends up being useful for bulk or incremental reads, where a longer running process needs a stable view of the dataset.

ewrcoffee · on March 17, 2017

Without session transaction, how does the application performs transactional read modify write?

Efrim-Lipkin · on March 15, 2017

Okay, Last question because I'm in Asia and it's 7:30 and I am tired. (But yea, I'm a Midwestern computer scientist who just happens to be in Asia ;-)

Question #2 -- Two, or Three, or More transactions occur simultaneously that want to change data D. each of these transactions send out transaction logs that contradict each other. What happens?

EL

felixgallo · on March 15, 2017

snapshot time as in wall clock time? What is the time source?

freels · on March 16, 2017

It's variable, and the client can provide it. By default, we use the greater of wall clock time or the highest timestamp transaction timestamp the node serving the query has seen.

felixgallo · on March 16, 2017

In that case, your database loses data and is definitely not strongly consistent. Why do you claim it is?

freels · on March 16, 2017

To be clear, read-write transactions are applied in a consistent order derived from the transaction log. Read-only transactions can rely on the fact that read-write transactions are totally ordered, and execute without global coordination while still providing a guarantee of serializability. Wall clock time is used as a suggestion only, in order for the database's logical timestamps to track as close to real time as possible. We will have more information about how Fauna meets its consistency guarantees in a future blog post.

felixgallo · on March 16, 2017

you can't have a read transaction that's serializable that doesn't go through global coordination unless all of the data is guaranteed to only ever be local. Looking forward to the blog post.

anentropic · on March 16, 2017

looking forward to Jepsen...

elvinyung · on March 15, 2017

Looks like FaunaDB uses Raft[1], so I'd expect that data is sharded into multiple consensus groups, like Spanner or Megastore. That would mean consistency on a single shard/consensus group is basically just dependent on reading from and writing to the Raft leader.

1: https://news.ycombinator.com/item?id=13645876