Performance analysis indicates that SQLite spends very little time doing bytecod...

titzer · on April 30, 2024

Part of the benefit of compiling bytecode (or anything) is specializing code to the context (types, values, etc) in which it appears. While I don't doubt your analysis, it could be the case that compiled C code in question is full of branches that can be folded away when specialized to the context of the query, such as the structure of the rows, the type and values of columns, etc.

Basically all of what you are saying about high-level bytecodes applies to dynamic languages, too. But they benefit highly from specializing each bytecode given static and dynamic context, and shortcutting dataflow through local variables.

JonChesterfield · on May 1, 2024

There's usually a cost to shuttling data between bytecodes too. When two are fused together the second can lift the data from wherever the first wanted to leave it, as opposed to routing through a fixed location. Might be what you mean by shortcutting dataflow?

Also doing control flow in bytecode is usually slower than doing it in the native code.

I wonder if the context in which the instructions occur is sufficiently finite in sqlite for ahead of specialisation of the bytecode to be better. That is, the program you're operating on isn't known until JIT time, but the bytecode implementations are. SQL should correspond to an unusually specific set of operations relative to a general purpose language implementation.

The compiler will notice some values are constant when working with the bytecode. It can know ahead of time which arguments correspond to folding branches within the bytecode instructions and specialise correspondingly. If that works, you've emitted a sequence of calls into opcodes which are less branchy than they would otherwise be, at which point the opcode implementations start to look like basic blocks and a template JIT to machine code beckons.

cryptonector · on April 30, 2024

> Most CPU cycles are consumed in walking B-Trees, doing value comparisons, and decoding records

Emphasis added. This is because of SQLite3's varint encoding method for numbers. Performance-wise it was probably a mistake, though it's a form of compression, which might have paid off in terms of space.

(I seem to remember seeing something, possibly by you, about this before.)

I wonder if it would be possible to replace the varint encoding... Yes, it'd be a compatibility break in that older SQLite3s couldn't open newer DBs.

blacklion · on April 30, 2024

Looks like SQLIte bytecode is similar to IBM RPG language from "IBM i" platform, which is used directly, without translation from SQL :)

Edit: PRG->RPG.

wglb · on April 30, 2024

I'm curious. Would you have a pointer to the documentation of PRG language?

blacklion · on April 30, 2024

Oooops, it is IBM RPG, not PRG! My bad!

There are some links in Wikipedia.

I never used it, but only read big thread about SQL vs RPG on Russian-speaking forum, and there was a ton of examples from person who works with IBM i platform in some big bank. Basic operations look like SQLite bytecodes: open table, move cursor after record with key value "X", get some fields, update some field, plus loops, if's, basic arithmetic, etc.

kayodelycaon · on April 30, 2024

For those that aren’t familiar, IBM i is ye olde A/S 400.

To be fair, RPG has been continuously updated and the version I’ve seen has Java integration.

It’s somewhat amusing that a single developer using Python can run circles around entire teams of RPG programmers. Technology marches on. :)

wglb · on May 1, 2024

Well, having programmed RPG professionally, there is no question that a python programmer would clock any RPG programmer. RPG is a terrible way to program.

lmz · on May 1, 2024

That kind of manipulation isn't limited to RPG. I believe dBase and descendants also work similarly with manual cursor manipulation.

wglb · on May 1, 2024

Ah. I did RPG on a system 33, a precursor to the AS 400.

ngcc_hk · on May 1, 2024

32, 34 or 36? or 38 which is very different and the real R system that leads to as/400 or I … s/33?

wglb · on May 2, 2024

Sorry, good point. It was a 34.

wolf550e · on April 30, 2024

It is possible to construct a worst case scenario for bytecode execution, for example very complex expressions in the WHERE and/or SELECT clauses that compute values, and a query plan that performs a full table scan over a table with say 100 million rows that is cached in RAM (or maybe use generate_series, whatever works best).

Computing the expressions should dominate execution time, right?

Then, to compare against the best possible case, we can write a custom C program that uses sqlite internals to perform the same task (full scan of the table, extract values from row) and does not use the bytecode VM and computes the complex expressions in regular C code (e.g. a function that accepts floats and returns a float or whatever).

Then comparing the two implementations will tell us how much faster sqlite can be if it had a "perfect" JIT.

paulddraper · on April 30, 2024

> A key point to keep in mind is that SQLite bytecodes tend to be very high-level

That's right. "Bytecode" is a spectrum. SQLite's bytecode is higher-level than e.g. WASM, JVM, Python, etc. (Notably, because the original source code is higher-level.)

temporarely · on April 30, 2024

A while back was musing if it was possible to come up with something resembling the instruction set for a CPU for an abstract relational-engine. Is this basically what SQLite is doing with bytecodes?

JonChesterfield · on May 1, 2024

I think yes, essentially. Bytecode running on a software interpreter is the same thing as machine code running on a silicon interpreter. Probably slower, probably a different ISA, but very much the same idea.

The OP suggests the sqlite bytecode also has arithmetic & control flow in it which you might want to exclude in other settings. There's a parallel with cisc vs risc instruction sets here, and to calls into a compiler runtime vs expanding instructions into larger numbers of more primitive ones in place.

@rkrzr there's a circular definition in your "no" - if "CPU" means "an interpreter that executes instructions simpler than SQL" then this is indeed not an instruction set. If "CPU" means "interpreter of a given ISA" then it could be. The virtual machine in the sqlite codebase is the definition of the CPU for this instruction set. Unless you want to define CPU as exclusive of software implementations of interpreters.

rkrzr · on April 30, 2024

No. The SQLite bytecode is much higher-level than the instruction set of a CPU.

beeboobaa3 · on April 30, 2024

> Bytecode dispatch is using less than 3% of the total CPU time, according to my measurements

> compiling all the way down to machine code might provide a performance boost 3% or less

This logic doesn't seem sound? Because the application now spends 3% on bytecode dispatch doesn't tell us anything about how long it would instead spend on e.g. interpreting SQL.

asimpletune · on April 30, 2024

He’s saying in the best case scenario that 3% would go to 0. Therefore the reality would probably be even less.

vlovich123 · on April 30, 2024

Unfortunately you can’t do performance analysis this way but I think the overall point that it’s a fraction probably still stands as you’d expect the I/O work to dominate.

The reason you can’t do the analysis the way that you explicitly stated (which fairly is what was implied) is that when you lower the code to machine code you typically get rid of branches in the execution path. Since branches slow down execution by a disproportionate amount vs how long they themselves take, it’s easy to get a >3% boost getting rid of code paths that seem like there’s only 3% of room.

What the author also failed to mention is that they’ve gone on many optimization hunts eeking out less than a percent here or there to get a cumulative boost, so 3% isn’t anything to sneeze at.

That being said, the broader point which is valid is that the maintenance isn’t worth the performance profile that SQLite targets not to mention that JITs open up an attack vector for security exploits. So the cost vs benefit is squarely in the interpreted side for now.

Edit: Forgot to mention that performance tuning a JIT to work well is also hard - for small queries you’ll spend more time compiling the byte code than you would just executing. That’s why all the big JS engines do a tiered approach where each tier optimizes a bit more.