I recently implemented my own expression evaluator in java for in-memory data fr...

eklavya · on April 30, 2024

Bytecode will only impact packing, so more efficient ram, cache and cpu wise. But I don't understand how it would help with branching? You still have to make the same decisions? As in the bytecode executor still needs to do differnt things based on the op code, its not in hardware.

alex_smart · on April 30, 2024

Consider evaluating an expression like "col1 + col2 == col3"

A tree based evaluator has to do something like -

    for (int i = 0; i < df.size(); i++) {
        // switch on expression type (column, binary operator, unary operator etc)
        // switch on expression operator type (add, subtract, equals etc)
        // switch on column type (int, long, float etc)
        // actual evaluation
    }

whereas a bytecode based evaluator is able to run something equivalent to -

    for (int i = 0; i < df.size(); i++) {
        result[i] = df.getLong(column1Index, i) + df.getLong(column2Index, i)) == df.getLong(column3Index, i)
    }

naasking · on April 30, 2024

You still have to branch on opcode selection which you're omitting in your translation. Branching on "expression type" and column type in your first example is also unnecessary. Bytecodes have different opcodes to operate on different types and different arities, so in both cases you have only one unpredictable branch for each operation/opcode.

The main benefit of bytecodes is then cache friendliness, by removing all of those unpredictable loads and stores to obtain the expression details (opcode, arguments, etc.). All of those are inlined into the bytecode array which is already in cache.

alex_smart · on April 30, 2024

> Branching on "expression type" and column type in your first example is also unnecessary

It’s not?

naasking · on April 30, 2024

It is unnecessary, only branching on expression operator type is strictly necessary. You're trying to abstract too much in your AST which yields an inefficient interpreter. I can invent an inefficient bytecode too, but that wouldn't be a correct evaluation of how fast bytecode interpretation could be when done right.

hickelpickle · on April 30, 2024

There is threaded bytecode as well, which uses direct jumping vs a switch for dispatch. This can improve branch prediction, though it is a debated topic and may not offer much improvement for modern processors.

kaba0 · on April 30, 2024

Do you have perhaps some links/references on that?

I have once tried benchmarking it by writing a tiny VM interpreter and a corresponding threaded one with direct jumps in Zig (which can force inline a call, so I could do efficient direct jumps) and I have - to me surprisingly- found that the naive while-switch loop was faster, even though the resulting assembly of the second approach seemed right.

I wasn’t sure if I saw it only due to my tiny language and dumb example program, or if it’s something deeper. E.g. the JVM does use direct threaded code for their interpreter.

immibis · on April 30, 2024

How does it know where to jump to?

chuckadams · on April 30, 2024

The jump target is compiled into the bytecode, so rather than return to the big switch statement, it jumps straight to the next opcode's implementation. The process is called "direct threading". These days a decent switch-based interpreter should fit in cache, so I'm not sure direct threading is much of a win anymore.

sitkack · on April 30, 2024

You should look at https://janino-compiler.github.io/janino/ it can compile Java into class files in memory that can be directly executed.

alex_smart · on April 30, 2024

I found this by looking at spark source code to figure out what they were doing under the hood to deal solve this problem. :)