The really key part seems to be this: *"If you aren’t using mmap, on the other h...

bluestreak · on Jan 14, 2022

Questdb's author here. I do share Ayende's sentiment. There are things that the OP paper doesn't mention, which can help mitigate some of the disadvantages:

- single-threaded calls to 'fallocate' will help avoiding sparse files and SIGBUS during memory write - over-allocating, caching memory addresses and minimizing OS calls - transactional safety can be implemented via shared memory model - hugetlb can minimize TLB shootdowns

I personally do not have any regrets using mmap because of all the benefits they provide

AdamProut · on Jan 14, 2022

I suppose. Some problems with mmap() are a bit hard to fix from user land though. You will hit contention on locks inside the kernel (mmap_sem) if the database does concurrent high throughput mmap()/unmap(). I don't follow linux kernel development closely to know if this has been improved recently, but it was easy to reproduce it 4-5 years ago.

ayende · on Jan 14, 2022

Almost no one is going to have a lot of map calls

Uou map the file once, then fault it in

tyingq · on Jan 14, 2022

That makes sense. I wasn't going right to the conclusion that working around mmap() issues was easier, but it didn't seem to be explored much. Is the contention around having one file mmap()ed, or is it reduced if you use more files?

dboreham · on Jan 14, 2022

When I worked on/with BerkeleyDB in the late 90s we came to the conclusion that the various OS mmap() implementations had been tweaked/fixed to the point where they worked for the popular high profile applications (in those days: Oracle). So it can appear like everything is fine, but that probably means your code behaves the same way as <popular database du jour>.

tytso · on Jan 14, 2022

Um... Oracle (and other enterprise databases like DB2) don't use mmap. They use Direct I/O. Oracle does have anonymous (non-file-backed) memory which is mmap'ed and shared across various Oracle processes, called the Shared Global Area (SGA), but it's not used for I/O.

hyc_symas · on Jan 15, 2022

Fwiw, I wrote a Direct I/O patch for BerkeleyDB but withdrew it later because it didn't ever improve I/O perf or memory footprint.

ayende · on Jan 14, 2022

Yes, isn't that wonderful?

You get to take advantage of literally decades of experience

What is more, if you can match the profile of the optimization, you can benefit even more

jandrewrogers · on Jan 14, 2022

Some issues with mmap() can be avoided entirely if you have your own buffer pool. Others are easier to handle because they are made explicit and more buffer state is exposed to the program logic. That's the positive side.

The downside is that writing an excellent buffer pool is not trivial, especially if you haven't done it before. There are many cross-cutting design concerns that have to be accounted for. In my experience, an excellent C++ implementation tends to be on the order of 2,000 lines of code -- someone has to write that. It also isn't simple code, the logic is relatively dense and subtle.