In these systems you propose, is it possible to store multiple logical transacti...

ot · on Feb 10, 2022

Yes, when you write to a file it doesn't actually hit the disk, it stays in some kernel buffer until the buffers get too big or fsync (or variants) is explicitly called. For example, in RocksDB you'd issue a few writes, and then call SyncWAL() to actually perform the IO and durably commit to disk (or issue a final sync=true write).

This is not something specific LSMs implement, it's just how kernels do file IO.

RocksDB does also additional IO coalescing for concurrent writes, though that's more about reducing syscalls cost (one write() per write group, instead of one per write) than IO cost.

bob1029 · on Feb 10, 2022

> For example, in RocksDB you'd issue a few writes, and then call SyncWAL() to actually perform the IO and durably commit to disk (or issue a final sync=true write).

Ok - that makes sense. I think we are mostly on the same page here. My "SyncWAL" is invoked naturally as my buffer reader hits the barrier each time and dumps the current batch of items to be processed.