Pro jit hacker here. Super cool seeing folks build stuff like this.
Thoughts:
- great to see an SSA pipeline, but that implies that this jit is not ideal for baseline jiting. That’s fine. I guess I would explicitly call out that although it’s light, it is an optimizer.
- which reminds me - a JIT based VM is most defined by how powerful its top JIT is, not how light it is. Maybe since MIR is an optimizer, it’s meant to eventually not be light? It’s generally a bad idea to set out to build a jit, but to then aim in between baseline and full opt. This middle light jits are best built after you’ve evolved your top and bottom tiers (sorta like what happened in V8).
- jit IRs need to be given space to evolve to point directly at the user heap, which precludes any kind of reliable serialization. You can dump IR for debugging but you can’t parse IR. This MIR thing is serializable, which would make it an awesome AOT IR. But the serialization code will get in the way once folks try to implement jit based optimizations (like anything speculative). Best to kill the serialization now before it becomes a problem.
- if the first front end is C11 then you risk biasing the jit to be very C oriented. Languages that need JIts usually also need weird stuff that C doesn’t need. Super important to test a weirder language. Maybe Lua, since it’s weird but simple.
Anyway, this is cool shit. Nice to see people building jits!
“Using SSA simplifies the optimization implementations, but building and destroying this form is expensive. Using SSA for the MIR generator's short-pass pipeline makes little sense to me, so I don't use it.”
OK, so here's the deal with SSA and the running time of your compiler.
- You can convert code to SSA in something like O(N log N) or even close to O(N) if you get fancy enough. The conversion step is not very expensive to run, though it is annoying to implement. Converting to SSA is fine, that's not the problem.
- Optimizations implemented on top of SSA are cheap to run. That's sort of the point of SSA. So, that's not the problem.
- SSA optimizations achieve goodness when you have multiple of those optimizations running in something like a fixpoint. It's pointless to say "I have SSA" and then just implement one of those optimizations. I don't think MIR does that; like most SSA optimizers, it has multiple optimizations. The problem sort of starts here: SSA sort of implies that you're building an optimizer with multiple optimizations. Even if each opt is cheap, having SSA usually implies that you're going to pile on a bunch of them. So, you'll have cost either from death-by-a-thousand-cuts, or you'll eventually add an optimization that's superlinear.
- After your run SSA optimizations, you'll have nontrivial global data flow - as in, there will be data flows between basic blocks that weren't there when you started, and if you do a lot of different SSA optimizations, then these data flows will have a very complex shape. That's caused by the fact that SSA is really good at enabling compilers to "move" operations to the place in control flow where they make the most sense. Sometimes operations get moved very far, leading to values that are live for a long time (they are live across many instructions). This is where the problem gets significantly worse.
- Even without SSA optimizations, SSA form adds complexity to the data flow graph because of Phi nodes. You almost always want the Phi and its inputs to use the same register in the end, but that's not a given. This adds to the problem even more.
- Probably the reason why you went for SSA was perf. But to get perf from SSA-optimized code, you need to have some way of detangling that data flow graph. You'll want to coalesce those Phis, for example - otherwise SSA will actually pessimise your code by introducing lots of redundant move instructions. Then you'll also want to split ranges (make it so a variable can have one register in one part of a function and a different register in another part), since SSA optimizations are notorious for creating variables that live for a long time and interfere with everything. And then you'll definitely need a decent register allocator. No matter how you implement them, the combo of coalescing, splitting, and register allocation will be superlinear. It's going to be a major bottleneck in your compiler, to the point that you'll wish you had a baseline JIT.
Those things I mentioned - coalsce, split, regalloc - are expensive enough that you won't want them in a JIT that is the bottom tier. But they're great to have in any compiler that isn't the bottom tier. So, it's a good idea to have some non-SSA baseline (or template, or copy-and-patch, or whatever you want to call it) JIT, or a decent interpreter, as your bottom tier.
Thoughts:
- great to see an SSA pipeline, but that implies that this jit is not ideal for baseline jiting. That’s fine. I guess I would explicitly call out that although it’s light, it is an optimizer.
- which reminds me - a JIT based VM is most defined by how powerful its top JIT is, not how light it is. Maybe since MIR is an optimizer, it’s meant to eventually not be light? It’s generally a bad idea to set out to build a jit, but to then aim in between baseline and full opt. This middle light jits are best built after you’ve evolved your top and bottom tiers (sorta like what happened in V8).
- jit IRs need to be given space to evolve to point directly at the user heap, which precludes any kind of reliable serialization. You can dump IR for debugging but you can’t parse IR. This MIR thing is serializable, which would make it an awesome AOT IR. But the serialization code will get in the way once folks try to implement jit based optimizations (like anything speculative). Best to kill the serialization now before it becomes a problem.
- if the first front end is C11 then you risk biasing the jit to be very C oriented. Languages that need JIts usually also need weird stuff that C doesn’t need. Super important to test a weirder language. Maybe Lua, since it’s weird but simple.
Anyway, this is cool shit. Nice to see people building jits!