Mojo vs. Rust: is Mojo faster than Rust?

pornel · on Feb 13, 2024

They have some promising features, but they need to learn Rust before dissing it. Their explanations have inaccuracies, and examples have novice mistakes.

• There's no implicit string copying in Rust. Even when passing String ownership, it will usually be passed in registers. It's idiomatic to use &str by default.

• Rust doesn't do eager drop, but their explanation is clumsy. The borrow checker doesn't influence destructors. &String and usize can't have a destructor, and can be forgotten at any time.

• Their benchmark intended to demonstrate tail call cost compiles to a constant, with no `factorial()` calls at run time. They're only benchmarking calls to black_box(1307674368000). Criterion put read_volatile in there. Mojo probably uses a zero-cost LLVM intrinsic instead.

jack_clayto · on Feb 13, 2024

Hi author here, definitely not trying to diss Rust, I love Rust! I'm pointing out some interesting overheads that aren't well known by the average Rust programmer, which Mojo was able to improve upon with the power of hindsight and being a newer thing. For your points below, let me clarify:

> There's no implicit string copying in Rust. Even when passing String ownership, it will usually be passed in registers.

The String metadata can be passed in registers if LLVM does that optimization, but it's not guaranteed and doesn't always happen. Rust move is just a memcpy, there are situations where LLVM doesn't optimize them away, resulting in Rust programs doing a lot more memcpy than people realize.

> It's idiomatic to use `&str` by default.

True if you want it to be immutable, but this actually adds to my point. That is the default behavior in Mojo without having to understand things like deref coercion and the difference between `&str` and `&String`. In Rust it's an unintuitive best practice, which everyone has to learn pretty early in their journey. In Mojo they get the best behavior by default, which gives them a more gentle learning curve, important for our Python audience. Default behavior > idiomatic things to learn.

> The borrow checker doesn't influence destructors.

I didn't claim that, my point was that Rust does do runtime checks using drop flags, to check if a value should be dropped. This can be done statically during compilation, but won't happen if the initialization state of an object is unknown at compile time: https://doc.rust-lang.org/nomicon/drop-flags.html

> &String and usize can't have a destructor, and can be forgotten at any time

In the example, the call stack is growing with a new reference and usize in each frame for each call. This is why tail recursion in Rust has so many issues, those values need to be available to the end of scope to satisfy Rust's guarantees, they can't be "forgotten at any time". It also overflows the stack a lot faster.

> Their benchmark intended to demonstrate tail call cost compiles to a constant, with no `factorial()` calls at run time. They're only benchmarking calls to black_box(1307674368000). Criterion put read_volatile in there. Mojo probably uses a zero-cost LLVM intrinsic instead.

If the Rust benchmark isn't calling `factorial()` it should be instant and faster than Mojo, the Rust version is must slower. `benchmark.keep` in Mojo is a "clobber" directive, indicating that the value could be read or written to at any time, so LLVM doesn't optimize away the function calls to get the result.

Thanks for taking the time to read the post, and write out your thoughts. Really enjoying the discussion around these topics.

Dr_Emann · on Feb 13, 2024

Rust optimizes factorial to be iterative, not using recursion (tail or otherwise) at all, and it turns `factorial(15, 1)` into `1307674368000`: https://rust.godbolt.org/z/bGrWfYKrP. As has been pointed out a few times, you're benchmarking `criterion::black_box` vs `benchmark.keep` (try the newer `std::hint::black_box`, which is built into the compiler and should have lower overhead)

And no: in the example with `&String` and `usize`, the stack isn't growing: https://rust.godbolt.org/z/6zW6WfGE7

jack_clayto · on Feb 14, 2024

I updated the blog with full benchmark reproduction instructions, I also removed criterion::black_box altogether, and it resulted in no performance difference. Removing benchmark.keep from Mojo causes it to optimize away everything and run in less than a picosecond.

If you could show me a benchmark that supports what you're saying that'd be great, thanks.

jack_clayto · on Feb 15, 2024

I did a lot more benchmarks and Rust TCO is happening in a lot of scenarios. Thanks for pointing this out, I updated this section in the blog.

Dr_Emann · on Feb 15, 2024

Hi, I think even the remaining benchmark isn't showing what you're trying to show:

https://rust.godbolt.org/z/r9rP6xohb

Rust realizes the vector is never used, and so never does any allocation, or recursion, it just turns into a loop to count up to 999_999_999.

And some back of the napkin math says there's no way either benchmark is actually allocating anything. Even if malloc took 1 nanosecond (it _doesn't_), 999_999_999 nanoseconds is 0.999999999 seconds.

It _is_ somewhat surprising that rust doesn't realize the loop can be completely optimized away, like it does without the unused Vec, but this benchmark still isn't showing what you're trying to show.

jack_clayto · on Feb 17, 2024

True thanks! I updated the example again, profiled this time to make sure each program is actually allocating.

lifthrasiir · on Feb 13, 2024

> The String metadata can be passed in registers if LLVM does that optimization, but it's not guaranteed and doesn't always happen. Rust move is just a memcpy, there are situations where LLVM doesn't optimize them away, resulting in Rust programs doing a lot more memcpy than people realize.

I believe even this is not very correct, because memcpy here is an implementation detail for moves. Rust can relatively easily amend its (not yet standardized) ABI to not physically move arguments larger than some threshold like many C++ ABIs if needed. I don't know about the current status, but AFAIK it was considered multiple times in the past.

pornel · on Feb 13, 2024

Passing arguments in registers is not an optimization, but an ABI. It always happens up to a certain number of arguments, and Rust in particular uses an ABI that flattens more structs into registers than C++.

Other moves could be memcpy, but there's a distinction between Rust saying moves behave like memcpy, and moves actually being memcpys. String's 3×size_t (or 2×size_t for Box<str>/Arc<str>) is below LLVM's threshold of actual memcpy call. Rust has optimization passes for eliminating redundant moves.

You're giving an impression that memcpy happens all over the place, where in reality it's quite rare, and certainly doesn't happen in the simple cases you describe.

In Rust, knowledge of ownership and the zoo of strings is a requirement (e.g. use of &String is a novice error). It's nice that Mojo can hide it, and you could celebrate that without making dubious performance claims.

> True if you want it to be immutable, but this actually adds to my point.

Sigh, it adds to inaccuracies. Mutable strings are &mut String, passed as a single pointer, so a mutable string is an even better case of a thin reference that doesn't need memcpy.

> those values need to be available to the end of scope to satisfy Rust's guarantees

No they don't. You're conflating Rust's guaranteed Drop order (which does interfere with TCO) with borrow checking and stack usage, which don't. For references and Copy types, Rust has an "eager drop" behavior. Their existence on the stack is not guaranteed nor necessary.

Borrow checking scopes are hypothetical for the sake of the check, and don't influence code generation in any way. You can literally remove borrow checker and lifetimes from Rust entirely, and the code will compile to the same instructions — mrustc implementation is a real example of that.

Your example function where you try to demonstrate how the arguments prevent TCO compiles to a single `ret`.

> it should be instant and faster than Mojo

The `factorial()` is instant, but the `black_box` isn't because Rust/Criterion implements it differently than Mojo. So Mojo has a faster `benchmark.keep` function, and you failed to benchmark the relevant function, and presented a misleading benchmark with a wrong conclusion.

You should validate your claims on what Rust does by actually checking the output. Try https://rust.godbolt.org/ (don't forget to add -O to flags!) or using cargo-show-asm.

jack_clayto · on Feb 15, 2024

Hi thanks for the discussion, I tried out a bunch of different benchmarks, and Rust TCO is actually working as you say it does. I removed that part from the blog. Thanks very much for the discussion, I definitely need to upskill on assembly.

jack_clayto · on Feb 14, 2024

Here's at least three Rust veterans in this thread explaining that move is just a memcpy which can be optimized away: https://users.rust-lang.org/t/move-semantics-rust-vs-c/61274...

I removed criterion::black_black from Rust and it had no performance difference, so updated the blog. If I removed benchmark.keep from Mojo it ran in less than a picosecond, so I left it in to be fair.

Can you show me a benchmark to dispute my claim about recursion? I'm not sure what the generated assembly of a function that's not being run is meant to prove.

devit · on Feb 12, 2024

Rust uses all zero-cost abstractions except for bounds checking and integer overflow checking, so no language can be faster than Rust (assuming equal quality implementations) unless it either isn't memory safe or it can statically prove bounds more frequently, which pretty much requires dependent types (but current languages with dependent types box everything AFAIK so they are terrible performance-wise).

Mojo seems to not only not lack dependent types but also lack first-class references with lifetimes and no lifetime kind, only having borrowed function arguments.

So the answer is no, it is going to be slower since some patterns cannot be expressed and would require costly abstractions.

For example, it looks like that in Mojo an hash table lookup requires to copy the value, while Rust just returns a reference to the value inside the hash table, which it can do since it can express that the return value has the same lifetime as the self parameter because Rust has first-class lifetimes and references (although in this case they can be elided).

Asraelite · on Feb 13, 2024

> so no language can be faster than Rust (assuming equal quality implementations) unless it either isn't memory safe or it can statically prove bounds more frequently

I think this is exaggerating things a little. There are definitely areas where performance could be improved if Rust supported additional features. One big one is more precise control over memory layout to reduce cache misses, like the ability to switch between structs of arrays and arrays of structs. Another is a ffast-math equivalent, which Rust currently lacks.

jack_clayto · on Feb 13, 2024

Hi author here, Mojo has lifetimes and references! But they're in their infancy and being worked out in the standard library, before we document how to use them for external users. We need to iterate on the syntax and usability.

Your point about hash tables is true, we have an open ticket to improve this in the stdlib. There are a lot of things that can be improved in the stdlib since we've added features like references, we have dedicated engineers for this now.

lazzlazzlazz · on Feb 12, 2024

How is Mojo going to handle dependencies? This is something that Rust handles much better than Python with essentially no debate. Copying from Python here would be a huge error. How can we guide Mojo toward something closer to Crates?

jack_clayto · on Feb 13, 2024

This is all in the design stage, we realize how important it is to the wider community!

diarrhea · on Feb 13, 2024

A solution like poetry or pdm would seem most appropriate, and most closely resembles cargo.

Built-in formatting, testing and perhaps even linting would be nice. That’s how Go, Rust and Python tools like Rye do it, and I prefer that way.

lazzlazzlazz · on Feb 14, 2024

For what it's worth, Poetry and pdm paper over the issue and are still profoundly flawed. Rye is a bit better, but still not there.

minraws · on Feb 16, 2024

I am interested in Mojo but this is not the way of comparing languages. The author seems to be novice Rust user with no practical experience in serious Rust projects.

They are comparing orthogonal components and features. CoW works in Rust as well, you can use impl Into for much greater performance and ease of use. dbg macro being questioned is just farcical, it's use of T is justified cause it returns T back. You can't accept &T and return T in Rust.

Furthermore I am not sure what they are trying to prove with this article. I would use Mojo if it was open source, and even just 10-20x faster than Python rather than 100x if it supported entirety of Python ecosystem.

And I work in Open Source AI Platforms as my day job. Why write this nonsensical post, is this for clicks??

If I see another post like this I might just stop caring about Mojo from now on.

Why pay attention when devs are more interested in flexing than making their work open source? Is this some sort of scam? Will you make it Business Source License? Will and when will it support all of Python ecosystem? Are alot more important questions for me.

Also perhaps improve Python's tooling while at it. Astral.sh folks made ruff and uv, Prefix folks made rip, pixi and rattler-build... Maybe now improve something else?

alchemio · on Feb 12, 2024

I’m excited to try mojo. I just don’t wanna create an account just to try it.

doakes · on Feb 12, 2024

Agreed, looks interesting but odd you need an account. Glancing at their Terms of Use, I think you're also agreeing they can use your "User Generated Content" (including code) for themselves.

hustwindmaple1 · on Feb 13, 2024

maybe they are tracking # of accounts for their next raise from VCs

wolfspaw · on Feb 12, 2024

Nice, Mojo seems awesome.

It has better ergonomics than Rust, and the same performance ballpark (or better?)

I Just don't trust closed source PLs...

DrMeepster · on Feb 13, 2024

You don't actually explain why Mojo doesn't need Pin, you just say that it works. Does Mojo automatically fix up self referential pointers on moves or something?

az09mugen · on Feb 12, 2024

One thing I don't like about Rust is its verbosity in writing code. It's nice it borrowed some concepts from functional programming and other niceties but heck, why make the syntax so long and complex ? Same as Java in a way, it's like people love to create 3 km-long function names in a standalone file lost inside 4 levels of folders. Why ? I'd rather prefer the syntax of Mojo which reminds me the simplicity of Python/Julia/Ruby/Haskell. I'm not saying all the languages should look like these and for example PHP, Scala, Kotlin, Golang are quite OK in my opinion. I'm not talking about performance, but ease of use, readability, conciseness, have no non-sense,... Programming languages should be built for humans in the first place, at least I think.

</rant>

estebank · on Feb 13, 2024

https://matklad.github.io/2023/01/26/rusts-ugly-syntax.html

Alifatisk · on Feb 13, 2024

The last code example is how I wish Rust looked like, I guess I’m the wrong audience

estebank · on Feb 14, 2024

FWIW, most of my code reads closer the the third to last version, which only adds the &s and ?s.

hulitu · on Feb 13, 2024

> Mojo vs. Rust: is Mojo faster than Rust?

The better question will be: Can you compile mojo ? Because for rust you seem to need (i was never able to do it) specific rust compiler versions to compile the current one.

samuell · on Feb 12, 2024

I know the last couple of Mojo posts have sparked some controversy, but thought this post at least clarified a few things so worth a read I think.

johnwick666 · on Feb 17, 2024