4. lots of small building blocks of static machine code precompiled/shipped with DB software binary, later iterated & looped through based on the query plan the optimizer came up with. Oracle does this with their columnar/vector/SIMD processing In-Memory Database option (it's not like LLVM as it doesn't compile/link/rearrange the existing binary building blocks, just jumps & loops through the existing ones in the required order)
Edit: It's worth clarifying that the entire codebase does not run like that, not even the entire plan tree - just the scans and tight vectorized aggregation/join loops on the columns/data ranges that happen to be held in RAM in a columnar format.
Isn't what you're describing just an interpreter, bytecode or otherwise? An interpreter walks through a data structure in order to determine which bits of precompiled code to execute in which order and with which arguments. In a bytecode interpreter the data structure is a sequential array, in a plain interpreter it's something more complicated like an AST.
Is this just the same as "interpret the query plan" or am I missing something?
It interprets where it needs to jump and then jumps to the precompiled binary machine code locations that are executed natively on the CPU, no interpretation going on (I guess it's conceptually something like instruction traces inside the CPU trace cache, but done at higher level). These precompiled building blocks are shipped in separate libraries, one library for each CPU architecture on a platform (SSE4.2, AVX, AVX2, AVX-521 on AMD64 and other SIMD architectures on other platforms that Oracle supports). Oracle dynamically loads the correct library matching the CPUs capability during the startup and starts jumping there as needed.
So this is not bytecode interpretation, it's native machine code executed directly on CPU whenever the SQL exec engine decides to jump there. This is different from, say Apache Impala that compiles new LLVM machine code for each query execution for the query's scanning tight loops.
Edit: I guess why I see this as not just not regular bytecode interpretation (I don't actually know that much about it in general), is that these building blocks can include various loops (and can do their thing on entire vectors of data), so looks like they can push quite a bit of complexity into the machine code sections, before returning back to the normal interpreted AST/opcode land.
It's called a template JIT. You get to remove the interpreter control flow overhead but tend to end up with a lot of register shuffling at the boundaries. Simpler than doing things properly, usually faster than a bytecode interpreter.
Edit: It's worth clarifying that the entire codebase does not run like that, not even the entire plan tree - just the scans and tight vectorized aggregation/join loops on the columns/data ranges that happen to be held in RAM in a columnar format.