They have some promising features, but they need to learn Rust before dissing it. Their explanations have inaccuracies, and examples have novice mistakes.
• There's no implicit string copying in Rust. Even when passing String ownership, it will usually be passed in registers. It's idiomatic to use &str by default.
• Rust doesn't do eager drop, but their explanation is clumsy. The borrow checker doesn't influence destructors. &String and usize can't have a destructor, and can be forgotten at any time.
• Their benchmark intended to demonstrate tail call cost compiles to a constant, with no `factorial()` calls at run time. They're only benchmarking calls to black_box(1307674368000). Criterion put read_volatile in there. Mojo probably uses a zero-cost LLVM intrinsic instead.
Hi author here, definitely not trying to diss Rust, I love Rust! I'm pointing out some interesting overheads that aren't well known by the average Rust programmer, which Mojo was able to improve upon with the power of hindsight and being a newer thing. For your points below, let me clarify:
> There's no implicit string copying in Rust. Even when passing String ownership, it will usually be passed in registers.
The String metadata can be passed in registers if LLVM does that optimization, but it's not guaranteed and doesn't always happen. Rust move is just a memcpy, there are situations where LLVM doesn't optimize them away, resulting in Rust programs doing a lot more memcpy than people realize.
> It's idiomatic to use `&str` by default.
True if you want it to be immutable, but this actually adds to my point. That is the default behavior in Mojo without having to understand things like deref coercion and the difference between `&str` and `&String`. In Rust it's an unintuitive best practice, which everyone has to learn pretty early in their journey. In Mojo they get the best behavior by default, which gives them a more gentle learning curve, important for our Python audience. Default behavior > idiomatic things to learn.
> The borrow checker doesn't influence destructors.
I didn't claim that, my point was that Rust does do runtime checks using drop flags, to check if a value should be dropped. This can be done statically during compilation, but won't happen if the initialization state of an object is unknown at compile time: https://doc.rust-lang.org/nomicon/drop-flags.html
> &String and usize can't have a destructor, and can be forgotten at any time
In the example, the call stack is growing with a new reference and usize in each frame for each call. This is why tail recursion in Rust has so many issues, those values need to be available to the end of scope to satisfy Rust's guarantees, they can't be "forgotten at any time". It also overflows the stack a lot faster.
> Their benchmark intended to demonstrate tail call cost compiles to a constant, with no `factorial()` calls at run time. They're only benchmarking calls to black_box(1307674368000). Criterion put read_volatile in there. Mojo probably uses a zero-cost LLVM intrinsic instead.
If the Rust benchmark isn't calling `factorial()` it should be instant and faster than Mojo, the Rust version is must slower. `benchmark.keep` in Mojo is a "clobber" directive, indicating that the value could be read or written to at any time, so LLVM doesn't optimize away the function calls to get the result.
Thanks for taking the time to read the post, and write out your thoughts. Really enjoying the discussion around these topics.
Rust optimizes factorial to be iterative, not using recursion (tail or otherwise) at all, and it turns `factorial(15, 1)` into `1307674368000`: https://rust.godbolt.org/z/bGrWfYKrP. As has been pointed out a few times, you're benchmarking `criterion::black_box` vs `benchmark.keep` (try the newer `std::hint::black_box`, which is built into the compiler and should have lower overhead)
I updated the blog with full benchmark reproduction instructions, I also removed criterion::black_box altogether, and it resulted in no performance difference. Removing benchmark.keep from Mojo causes it to optimize away everything and run in less than a picosecond.
If you could show me a benchmark that supports what you're saying that'd be great, thanks.
Rust realizes the vector is never used, and so never does any allocation, or recursion, it just turns into a loop to count up to 999_999_999.
And some back of the napkin math says there's no way either benchmark is actually allocating anything. Even if malloc took 1 nanosecond (it _doesn't_), 999_999_999 nanoseconds is 0.999999999 seconds.
It _is_ somewhat surprising that rust doesn't realize the loop can be completely optimized away, like it does without the unused Vec, but this benchmark still isn't showing what you're trying to show.
> The String metadata can be passed in registers if LLVM does that optimization, but it's not guaranteed and doesn't always happen. Rust move is just a memcpy, there are situations where LLVM doesn't optimize them away, resulting in Rust programs doing a lot more memcpy than people realize.
I believe even this is not very correct, because memcpy here is an implementation detail for moves. Rust can relatively easily amend its (not yet standardized) ABI to not physically move arguments larger than some threshold like many C++ ABIs if needed. I don't know about the current status, but AFAIK it was considered multiple times in the past.
Passing arguments in registers is not an optimization, but an ABI. It always happens up to a certain number of arguments, and Rust in particular uses an ABI that flattens more structs into registers than C++.
Other moves could be memcpy, but there's a distinction between Rust saying moves behave like memcpy, and moves actually being memcpys. String's 3×size_t (or 2×size_t for Box<str>/Arc<str>) is below LLVM's threshold of actual memcpy call. Rust has optimization passes for eliminating redundant moves.
You're giving an impression that memcpy happens all over the place, where in reality it's quite rare, and certainly doesn't happen in the simple cases you describe.
In Rust, knowledge of ownership and the zoo of strings is a requirement (e.g. use of &String is a novice error). It's nice that Mojo can hide it, and you could celebrate that without making dubious performance claims.
> True if you want it to be immutable, but this actually adds to my point.
Sigh, it adds to inaccuracies. Mutable strings are &mut String, passed as a single pointer, so a mutable string is an even better case of a thin reference that doesn't need memcpy.
> those values need to be available to the end of scope to satisfy Rust's guarantees
No they don't. You're conflating Rust's guaranteed Drop order (which does interfere with TCO) with borrow checking and stack usage, which don't. For references and Copy types, Rust has an "eager drop" behavior. Their existence on the stack is not guaranteed nor necessary.
Borrow checking scopes are hypothetical for the sake of the check, and don't influence code generation in any way. You can literally remove borrow checker and lifetimes from Rust entirely, and the code will compile to the same instructions — mrustc implementation is a real example of that.
Your example function where you try to demonstrate how the arguments prevent TCO compiles to a single `ret`.
> it should be instant and faster than Mojo
The `factorial()` is instant, but the `black_box` isn't because Rust/Criterion implements it differently than Mojo. So Mojo has a faster `benchmark.keep` function, and you failed to benchmark the relevant function, and presented a misleading benchmark with a wrong conclusion.
You should validate your claims on what Rust does by actually checking the output. Try https://rust.godbolt.org/ (don't forget to add -O to flags!) or using cargo-show-asm.
Hi thanks for the discussion, I tried out a bunch of different benchmarks, and Rust TCO is actually working as you say it does. I removed that part from the blog. Thanks very much for the discussion, I definitely need to upskill on assembly.
I removed criterion::black_black from Rust and it had no performance difference, so updated the blog. If I removed benchmark.keep from Mojo it ran in less than a picosecond, so I left it in to be fair.
Can you show me a benchmark to dispute my claim about recursion? I'm not sure what the generated assembly of a function that's not being run is meant to prove.
This can happen in languages that use dynamic constructs that can't be optimized out. For example, there was a PHP-to-native compiler (HipHop/HPHPc) that lost to faster interpreters and JIT.
Apple's Rosetta 2 translates x86-64 to aarch64 that runs surprisingly fast, despite being mostly a straightforward translation of instructions, rather than something clever like a recompiling optimizing JIT.
And the plain old C is relatively fast without optimizations, because it doesn't rely on abstraction layers being optimized out.
I don't think you can generalize it like that. It really depends whether the complexity is necessary or not, and that's context-dependent.
If the implementation is too simple, it may have issues that impact its reliability. For example, code may be using simple linear search instead of fancier data structures/algorithms, and have issues with handling larger inputs or have accidentally-quadratic code paths. Or may updating data files without the complexity of atomic writes or inter-process locking, and corrupt its data if it's unexpectedly killed/restarted.
Implementations that are too simple may be less secure, e.g. lack features for precise access controls, or sandboxing. Simple parsers can become a vulnerability (e.g. request smuggling attacks from underestimating complexity of edge cases in HTTP and URLs).
Too-simple tools may just push complexity and security risks elsewhere. There's an endless supply of simple templating engines that make it too easy to introduce XSS vulnerabilities.
For someone who’s not a Windows developer this situation is incredibly confusing. I have no idea what is the correct way to make a modern Windows application.
It seems like everything made after 1995 is a deprecated abandonware, and whatever UI they’re building next is going to be dead before they finish implementing it.
Genuinely asking: nowadays, are there desktop UI application beyond the well known ones? Many of the new popular ones (e.g. Slack) basically have an HTML renderer.
App developers want consistent look of their application across various OSes, be it Mac/Win/Linux. This means developing with gtk for linux and winUI for windows is out of the window. Why do app developers want consistent look? because alternatively they would have to create documentation and screenshots for each os separately. And users wouldn't be able to help each other because Mac user would know different interface/look from Linux user.
OS developers, Apple/Microsoft don't have singular look. Microsoft didn't rewrite everything into WinUI settings , and anytime anyone needs to do anything serious they have to go through the win32 settings panels. This lack of OS coherency means that app developers don't feel need to stay consistent with OS, since OS itself is not uniform.
> App developers want consistent look of their application across various OSes, be it Mac/Win/Linux.
No, app developers want to write their code once. A consistent look is just a byproduct of that desire.
I’m also glad you didn’t mention anything about the customer, because ostensibly the app developer is doing this so they can deliver customer value faster, but at the cost of having an application that doesn’t fit into the native operating system. “Consistent look of their application across various OSes” usually means “invent your own custom look and behavior that isn’t standard for any OS”, which means customers have to spend extra time and effort to learn your bespoke UI customs rather than just use the ones they already know from their OS (gives the finger to Visual Studio Code).
The end result is that nobody gets to have an application that looks and feels like their chosen OS, and nobody gets to have a cross-platform application that looks and feels like other cross-platform applications unless they’re from the same vendor (and even then, it can be iffy).
> And users wouldn't be able to help each other because Mac user would know different interface/look from Linux user.
Not really. Customers can handle reasonable variations between software running on different operating systems. They do it today, for applications that work across different operating systems. Even when such an application takes most of its cues from another OS, there’s at least some deference to the native platform design in terms of menu design or control behavior.
What it would mean for the app developer is that they actually have to understand the design systems for the different OSes in order to build a design that can reasonably adapt to Mac/Linux/Windows. That’s why the “one design fits all” approach is so appealing when it comes to look and feel. You don’t have to understand anything. You don’t have to care about the native platform at all. There’s no extra work involved. You do whatever you want, the way you want it, and then throw it over the wall at the customer to deal with.
Agreed, especially about OSes not being consistent themselves. Even Apple dropped the ball with macOS being a mix of old Cocoa, SwiftUI-isms, and iPadOS transplants.
I think third there's also a third reason: flat design. It used to be nearly impossible to make custom UIs look as polished as the native ones with all their complex gradients and pixel-perfect bevelled edges. Now developers can use anything that can render a rectangle.
Microsoft doesn't even seem to care to dogfood outside of using those things for the most basic apps bundled with Windows and those basic apps really show why nobody should use those toolkits.
I have a laptop with a Ryzen 7 4800HS. It's a few years old but it has no business feeling slow. Yet opening the new winUI explorer or notepad shows visible lag in rendering the title/tab bar.
Native gui toolkits with no benefit over using electron, if anything I've seen plenty of electron apps that were more snappy than these. WinUI 3 is a dumpster fire.
I never used WMR, but I never understood why it had to be part of the operating system. Why load that onto millions of computers when so few people use it?
IIRC, it was one of the things that Microsoft made difficult to remove. You could uninstall it through some PowerShell incantation, but chances are it would be back in a couple of Tuesdays.
I own two HP Reverb G2s and am not at all happy about this. I shouldn't be forced to choose between being able to use my hardware and getting security updates. This behavior on the part of Microsoft should be illegal.
Rust's type system for thread safety is actually remarkably simple. Types declare whether they add or remove thread safety (e.g. Mutex adds safety, non-atomic Rc removes). Structs automatically become non-thread-safe if they have non-thread-safe fields. Then all the functions that spawn threads or send data over channels require thread-safe types.
The fearless concurrency is real. It reliably prevents data races, use-after-free, and moving of thread-specific data to another thread. It works across arbitrarily large and complex call graphs, including 3rd party dependencies and dynamic callbacks. Plus immutability is strongly enforced, and global mutable state without synchronization is not allowed.
It doesn't prevent deadlocks, but compared to data corruption heisenbugs, these are pretty easy — attach a debugger and you can see exactly what deadlocked where.
It's not the Java model because Java just makes every operation atomic (from the point of view of the model, I'm sure the JVM and javac must do optimizations to avoid some of them), while Rust enforces that multi threaded access must be through atomic operations, but if there is no multi threaded access or the type is not meant to be used in a multi threaded context, that information is encoded in the type system. This might sound like an academic distinction, but it is different: the developer is in control. You could even go as far as lie to the type system and claim a racy type is actually thread safe. I wouldn't advice doing so, but I can't stop you.
That's an incredibly broad criticism aimed at some hypothetical solutions you imagine, not grounded in what Rust does. No language can stop an imaginary infinitely determined fool.
Rust's restrictions, such as strict scopes of references and strongly enforced shared XOR mutable access, prevent many sloppy and careless designs that are possible in Java or C++.
Rust also takes advantage of its type system, generics, and ecosystem to offer solid constructs for multi-threading. There are safe data parallelism libraries, task queues, thread pools, scoped threads, channels, etc. Users are well equipped to implement multi-threading properly, and as much as possible Rust steers users towards locally-scoped, immutable or share-nothing solutions.
The main value of C and C++ is in their backward compatibility. Many projects don't even use current versions the languages, and stick to decades-old subsets instead.
There are safety improvements that C++ could do right now, but is unwilling to do. For example, `[]` on vectors could have bounds checks. `->` on empty `std::optional` could trap instead of being UB. Rust accepts performance cost of these things, C++ doesn't want to.
C and C++ move too slowly and are too fragmented and ossified to make a difference in the foreseeable future. They couldn't copy even the easy stuff like Rust's painless dependency management (C++ modules are not going well). Even if a miracle-safety-feature was invented tomorrow, it will take many many years before it goes through cycles of standardization, implementation across vendors, and new compilers being available as a baseline.
Safety improvements won't be a magic compiler flag — C has already deeply invested in analyzers, sanitizers, and fuzzers to improve safety without rewrites. Rust's improvements come from a richer but stricter type system, and that requires refactorings. There are safer C dialects already that have these features, but nobody wants to add lifetime annotations and rewrite their pointer arithmetic to use slices. "Nobody's going to rewrite millions lines of code" applies the same to Safe-C.
But the more Rust-like features they adopt, the easier it will be to automatically convert them to Rust.
Ugh, I'm worried that this could get ugly on social media, and it's just shooting the messenger.
Apple is very controlling about what the employees are allowed to say, and this is a cross between comments on future products/features and ongoing legal issues — things Apple really doesn't like commenting on.
Sort of shooting the messenger, but then again she is paid what, $3 - $500,000 TC to take those bullets… or maybe more? So that should soften the blow. I do prefer people be respectful and calm in voicing their distaste though
Nope, it could still be a bug - they typically remain silent even when major bugs are causing disruption. Just one of several ways Apple make developing for Safari a nightmare.
abs() of this value also happens to exceed the maximum positive value. Which means that in C, a simple negation can cause signed integer overflow -(-32768), and therefore UB.
• There's no implicit string copying in Rust. Even when passing String ownership, it will usually be passed in registers. It's idiomatic to use &str by default.
• Rust doesn't do eager drop, but their explanation is clumsy. The borrow checker doesn't influence destructors. &String and usize can't have a destructor, and can be forgotten at any time.
• Their benchmark intended to demonstrate tail call cost compiles to a constant, with no `factorial()` calls at run time. They're only benchmarking calls to black_box(1307674368000). Criterion put read_volatile in there. Mojo probably uses a zero-cost LLVM intrinsic instead.