Stack-based VMs, like SQLite's (I think), ARE trees. A stack based VM's bytecode (without DUP and POP) is just the post-order depth-first traversal of the corresponding expression tree. With DUP you have a connected acyclic DAG. With POP you have an acyclic DAG with one or more components. With loops you have a full graph.
When looked at this way, a VM makes the most sense actually because a pointer-heavy tree implementation is just terrible for caching and is super wasteful. Also, most SQL plans are trees (Until you get to WITH RECURSIVE).
I'm working on a CEP in C right now and a stack-based bytecode is simply the most 'obvious' way to represent a tree when you don't want to have to deal with lots of memory allocations. Just pre-allocate a buffer large enough and increment and go.
Either way, try taking something like the following and printing an expression tree like the one below
LITERAL 2
LITERAL 3
LITERAL 4
MUL
PLUS
now, write something to produce on a terminal, the following:
Plus
Mul
3
4
2
You should be able to do this in a simple iteration through the ops. Try it! Then try it with the variations above.
Now, try writing ops to manipulate and rewrite it. It's actually really a fun way to represent trees.
This was the question I was going to ask! I've recently been diving into Lua 5.x internals and have been extremely impressed with Lua's register-based bytecode implementation. Lua has a stable byte code interface between patch releases (5.4.0 -> 5.4.1, etc) but not between major/minor revisions (5.3 -> 5.4). SQLite on the other hand does not consider this a public interface at all!
> "Remember: The VDBE opcodes are not part of the interface definition for SQLite. The number of opcodes and their names and meanings change from one release of SQLite to the next."
https://www.sqlite.org/opcode.html#the_opcodes
For anyone interested in understanding how a register-based VM operates I highly recommend:
Not a public one, because they want to have the freedom to change how it works. But if you type in `EXPLAIN CREATE TABLE x(y);` or `EXPLAIN SELECT 3;` or whatever into `sqlite3`, you can see it. And it's reasonably straightforward to dig into the source code (SQLite is very well-written C) to hook that up yourself if you really wanted to.
> Also, most SQL plans are trees (Until you get to WITH RECURSIVE).
WITH RECURSIVE can also generally be implemented tree-style, you just loop a bunch in one of the iterators.
There are some databases, like Umbra, which can have DAG query plans for the benefit of more efficient groupjoin (GROUP BY pushed down into a join). IIRC some of the unnesting patterns can also benefit from non-tree plans.
When looked at this way, a VM makes the most sense actually because a pointer-heavy tree implementation is just terrible for caching and is super wasteful. Also, most SQL plans are trees (Until you get to WITH RECURSIVE).
I'm working on a CEP in C right now and a stack-based bytecode is simply the most 'obvious' way to represent a tree when you don't want to have to deal with lots of memory allocations. Just pre-allocate a buffer large enough and increment and go.
Either way, try taking something like the following and printing an expression tree like the one below
now, write something to produce on a terminal, the following: You should be able to do this in a simple iteration through the ops. Try it! Then try it with the variations above.Now, try writing ops to manipulate and rewrite it. It's actually really a fun way to represent trees.