> I know that I’m never impressed with the JavaScript side of the bindings produced by Empscripten
i don't mind saying, having spent much of the past 2 weeks in and around that code, that much of the generated part of the JS/wasm "glue" is... Much of it looks like it was thrown together by someone who half-understood JS and was just glad that it worked, with little or no attention to detail and refinement. It could use some TLC.
Yeah, last time I looked Emscripten’s were definitely considerably worse than wasm-bindgen’s, but that was a few years ago.
There’s also just a lot of missed opportunity for things like abbreviating identifiers, plus stuff that’s completely normal for optimising compilers like GCC or LLVM like inlining and safe code reordering to eliminate completely unnecessary temporary variables and the likes, but for which absolutely no equivalent tooling exists in JavaScript. And I have no idea why that’s the case. I know of only two even vaguely interesting projects along these general lines: Google’s Closure Compiler (2009– but functionally I don’t think much interesting has happened in the last decade), but it’s too esoteric and requires too many compromises for most people to use it (and it didn’t help that it’s written in Java); and Facebook’s Prepack (2018), which tried doing partial evaluation but they given up on it before it really got anywhere useful. Everything else is just quite hopeless, almost never going beyond very simplistic syntactic transformations that don’t modify semantics.
> almost never going beyond very simplistic syntactic transformations that don’t modify semantics.
I don’t know if this is advanced enough to not be completely hopeless but here’s what I see in my webpack config from 2018:
- hoisting and sharing common constant expressions including non-mutated object literals
- module, function, and variable names are all “abbreviated”
- there’s weird tricks of some kind going on for bound method definitions on classes
- Some elimination of “temporary variables” from object properties, arguments, or module imports
I think all those transforms are enough to substantially reduce the code size.
The real optimization like inlining, monomorphizing, hidden classes, etc are best left to the real compiler — which in Chrome/v8’s case is quite competitive with LLVM/GCC.
The things you are describing are what I’m calling simplistic syntactic transformations that don’t modify semantics.
As the slightest taste of the sort of thing I’m talking of:
• Any sane developer would be happy to rewrite `let x=a.c;a.b(x)` as `a.b(a.c)`, but that changes semantics (accessing a.b and a.c could have side-effects) and so no tooling short of Closure Compiler with appropriate hints is willing to do it. This sort of pattern is extremely common. End result: readable code is bigger than it should be.
• Tools like Terser are, on a good day, willing to inline functions that are used only once; but they only do this on free functions. Use methods, and they’re useless. Again, Closure Compiler with appropriate hints can do a bit better. End result: abstractions and good factoring are expensive, producing bigger and slower code.
• In fact, this generalises: state-of-the-art JavaScript tooling can do a little where everything is on the stack, but use objects and methods and such and they’re stymied. Inlining. Dead code removal. Name mangling. All things that work inside modules, function bodies, &c., but don’t work once you use objects and properties. Closure Compiler is the only tool that can do anything at all here, which is a real shame. Someone really should have built something like it atop TypeScript by now.
Going back a few years to where this was more prevalent and important, there was a significant difference of philosophy between Babel and Bublé, webpack and Rollup. Babel said “compile new constructs into something that does precisely the same thing”, at a significant code size, performance and readability cost, whereas Bublé said “compile new constructs into something small and fast that almost always does the same thing”, at a slight scope and compatibility cost. Webpack similarly did a robotic port of modules at a significant code size, performance and readability cost, whereas Rollup said “let’s unravel the now-unnecessary abstraction and produce efficient code”. In my view, Babel and webpack are profligate, doing the easy thing rather than the sensible thing; and Rollup does the sensible thing, at only a very slight compatibility cost (and Bublé did the mostly sensible thing, though at a much higher compatibility cost, but we don’t need it any more).
JavaScript engines are extraordinarily good at what they do, given the handicap that they start with. But if you can give them better code, they’ll fare better, especially before JITting occurs. I’m talking about giving them better-optimised code. Also about shipping smaller bundles.
Also, partial execution: if you haven’t worked with native code compilation, you might not realise just how good those things are, especially where mathematics is involved. I love the fact that, in benchmarking a Rust Base58 decoder I wrote recently, I had to use test::black_box on the input, or else decode_u64("jpXCZedGfVQ") evaluated to Ok(0xFFFFFFFFFFFFFFFF) at compile time (and it wasn’t even a const fn!). And Rust is only going further with varieties of guaranteed const evaluation. Prepack tried, but Facebook gave up on it for some reason. For me, I really just want something like Zig’s comptime in JavaScript (well, a compile-to-JavaScript JavaScript variant), as guaranteed partial execution.
> definitely considerably worse than wasm-bindgen’s,
Thank you, wasm-bindgen is a new term for me (wasm as a whole is new to me since about 2 weeks). i'll add that to the list of tools to check out, as we're actively exploring different options and methodologies at this point for wasm/sqlite.
I doubt wasm-bindgen will be your cup of tea as it’s Rust stuff, though perhaps there may be value in looking at what it generates.
For myself, I progressively lean in the direction of burning all of these things down (when possible, and it must be admitted that Emscripten’s strength is that it makes legacy stuff work) and writing binding JavaScript manually, also with less of an FFI/dual-sided-bindings/skip-blithely-between-languages flavour and more deliberate, less RPCy techniques.
> I doubt wasm-bindgen will be your cup of tea as it’s Rust stuff, though perhaps there may be value in looking at what it generates.
Yeah, Rust isn't part of the sqlite project's toolchain, but wasm is entirely new to the project and we're eager to learn more about it and to make sure that the JS/wasm code is not only usable but also "good code" (or at least "presentable" code!). Yes, we currently rely on emscripten's generated bits, but i've made an active effort to slowly trim down those dependencies as i get a clearer picture of where the borders between wasm, JS, emscripten, and client code lie. Ideally we wouldn't be dependent on one compiler, but currently we are. Baby steps!
Closure Compiler still works, I just wish it could output modern JS. It transpiles down to either ES3 or ES5 and there's no way to turn that off. Unfortunately there doesn't seem to be a good alternative.
i don't mind saying, having spent much of the past 2 weeks in and around that code, that much of the generated part of the JS/wasm "glue" is... Much of it looks like it was thrown together by someone who half-understood JS and was just glad that it worked, with little or no attention to detail and refinement. It could use some TLC.