"If you aren’t using mmap, on the other hand, you still need to handle of all those issues"
Which seems like a reasonable statement. Is it less work to make your own top-to-bottom buffer pool, and would that necessarily avoid similar issues? Or is it less work to use mmap(), but address the issues?
Questdb's author here. I do share Ayende's sentiment. There are things that the OP paper doesn't mention, which can help mitigate some of the disadvantages:
- single-threaded calls to 'fallocate' will help avoiding sparse files and SIGBUS during memory write
- over-allocating, caching memory addresses and minimizing OS calls
- transactional safety can be implemented via shared memory model
- hugetlb can minimize TLB shootdowns
I personally do not have any regrets using mmap because of all the benefits they provide
I suppose. Some problems with mmap() are a bit hard to fix from user land though. You will hit contention on locks inside the kernel (mmap_sem) if the database does concurrent high throughput mmap()/unmap(). I don't follow linux kernel development closely to know if this has been improved recently, but it was easy to reproduce it 4-5 years ago.
That makes sense. I wasn't going right to the conclusion that working around mmap() issues was easier, but it didn't seem to be explored much. Is the contention around having one file mmap()ed, or is it reduced if you use more files?
When I worked on/with BerkeleyDB in the late 90s we came to the conclusion that the various OS mmap() implementations had been tweaked/fixed to the point where they worked for the popular high profile applications (in those days: Oracle). So it can appear like everything is fine, but that probably means your code behaves the same way as <popular database du jour>.
Um... Oracle (and other enterprise databases like DB2) don't use mmap. They use Direct I/O. Oracle does have anonymous (non-file-backed) memory which is mmap'ed and shared across various Oracle processes, called the Shared Global Area (SGA), but it's not used for I/O.
Some issues with mmap() can be avoided entirely if you have your own buffer pool. Others are easier to handle because they are made explicit and more buffer state is exposed to the program logic. That's the positive side.
The downside is that writing an excellent buffer pool is not trivial, especially if you haven't done it before. There are many cross-cutting design concerns that have to be accounted for. In my experience, an excellent C++ implementation tends to be on the order of 2,000 lines of code -- someone has to write that. It also isn't simple code, the logic is relatively dense and subtle.
"If you aren’t using mmap, on the other hand, you still need to handle of all those issues"
Which seems like a reasonable statement. Is it less work to make your own top-to-bottom buffer pool, and would that necessarily avoid similar issues? Or is it less work to use mmap(), but address the issues?