more pornel's comments | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit | more pornel's comments

pornel on Feb 13, 2024 | parent | context | [–] | on: Mojo vs. Rust: is Mojo faster than Rust?

They have some promising features, but they need to learn Rust before dissing it. Their explanations have inaccuracies, and examples have novice mistakes.

• There's no implicit string copying in Rust. Even when passing String ownership, it will usually be passed in registers. It's idiomatic to use &str by default.

• Rust doesn't do eager drop, but their explanation is clumsy. The borrow checker doesn't influence destructors. &String and usize can't have a destructor, and can be forgotten at any time.

• Their benchmark intended to demonstrate tail call cost compiles to a constant, with no `factorial()` calls at run time. They're only benchmarking calls to black_box(1307674368000). Criterion put read_volatile in there. Mojo probably uses a zero-cost LLVM intrinsic instead.

jack_clayto on Feb 13, 2024 | | [–]

Hi author here, definitely not trying to diss Rust, I love Rust! I'm pointing out some interesting overheads that aren't well known by the average Rust programmer, which Mojo was able to improve upon with the power of hindsight and being a newer thing. For your points below, let me clarify:

> There's no implicit string copying in Rust. Even when passing String ownership, it will usually be passed in registers.

The String metadata can be passed in registers if LLVM does that optimization, but it's not guaranteed and doesn't always happen. Rust move is just a memcpy, there are situations where LLVM doesn't optimize them away, resulting in Rust programs doing a lot more memcpy than people realize.

> It's idiomatic to use `&str` by default.

True if you want it to be immutable, but this actually adds to my point. That is the default behavior in Mojo without having to understand things like deref coercion and the difference between `&str` and `&String`. In Rust it's an unintuitive best practice, which everyone has to learn pretty early in their journey. In Mojo they get the best behavior by default, which gives them a more gentle learning curve, important for our Python audience. Default behavior > idiomatic things to learn.

> The borrow checker doesn't influence destructors.

I didn't claim that, my point was that Rust does do runtime checks using drop flags, to check if a value should be dropped. This can be done statically during compilation, but won't happen if the initialization state of an object is unknown at compile time: https://doc.rust-lang.org/nomicon/drop-flags.html

> &String and usize can't have a destructor, and can be forgotten at any time

In the example, the call stack is growing with a new reference and usize in each frame for each call. This is why tail recursion in Rust has so many issues, those values need to be available to the end of scope to satisfy Rust's guarantees, they can't be "forgotten at any time". It also overflows the stack a lot faster.

> Their benchmark intended to demonstrate tail call cost compiles to a constant, with no `factorial()` calls at run time. They're only benchmarking calls to black_box(1307674368000). Criterion put read_volatile in there. Mojo probably uses a zero-cost LLVM intrinsic instead.

If the Rust benchmark isn't calling `factorial()` it should be instant and faster than Mojo, the Rust version is must slower. `benchmark.keep` in Mojo is a "clobber" directive, indicating that the value could be read or written to at any time, so LLVM doesn't optimize away the function calls to get the result.

Thanks for taking the time to read the post, and write out your thoughts. Really enjoying the discussion around these topics.

Dr_Emann on Feb 13, 2024 | | | [–]

Rust optimizes factorial to be iterative, not using recursion (tail or otherwise) at all, and it turns `factorial(15, 1)` into `1307674368000`: https://rust.godbolt.org/z/bGrWfYKrP. As has been pointed out a few times, you're benchmarking `criterion::black_box` vs `benchmark.keep` (try the newer `std::hint::black_box`, which is built into the compiler and should have lower overhead)

And no: in the example with `&String` and `usize`, the stack isn't growing: https://rust.godbolt.org/z/6zW6WfGE7

jack_clayto on Feb 14, 2024 | | | [–]

I updated the blog with full benchmark reproduction instructions, I also removed criterion::black_box altogether, and it resulted in no performance difference. Removing benchmark.keep from Mojo causes it to optimize away everything and run in less than a picosecond.

If you could show me a benchmark that supports what you're saying that'd be great, thanks.

jack_clayto on Feb 15, 2024 | | | | [–]

I did a lot more benchmarks and Rust TCO is happening in a lot of scenarios. Thanks for pointing this out, I updated this section in the blog.

Dr_Emann on Feb 15, 2024 | | | [–]

Hi, I think even the remaining benchmark isn't showing what you're trying to show:

https://rust.godbolt.org/z/r9rP6xohb

Rust realizes the vector is never used, and so never does any allocation, or recursion, it just turns into a loop to count up to 999_999_999.

And some back of the napkin math says there's no way either benchmark is actually allocating anything. Even if malloc took 1 nanosecond (it _doesn't_), 999_999_999 nanoseconds is 0.999999999 seconds.

It _is_ somewhat surprising that rust doesn't realize the loop can be completely optimized away, like it does without the unused Vec, but this benchmark still isn't showing what you're trying to show.

jack_clayto on Feb 17, 2024 | | | [–]

True thanks! I updated the example again, profiled this time to make sure each program is actually allocating.

lifthrasiir on Feb 13, 2024 | | | | [–]

> The String metadata can be passed in registers if LLVM does that optimization, but it's not guaranteed and doesn't always happen. Rust move is just a memcpy, there are situations where LLVM doesn't optimize them away, resulting in Rust programs doing a lot more memcpy than people realize.

I believe even this is not very correct, because memcpy here is an implementation detail for moves. Rust can relatively easily amend its (not yet standardized) ABI to not physically move arguments larger than some threshold like many C++ ABIs if needed. I don't know about the current status, but AFAIK it was considered multiple times in the past.

pornel on Feb 13, 2024 | | | | [–]

Passing arguments in registers is not an optimization, but an ABI. It always happens up to a certain number of arguments, and Rust in particular uses an ABI that flattens more structs into registers than C++.

Other moves could be memcpy, but there's a distinction between Rust saying moves behave like memcpy, and moves actually being memcpys. String's 3×size_t (or 2×size_t for Box<str>/Arc<str>) is below LLVM's threshold of actual memcpy call. Rust has optimization passes for eliminating redundant moves.

You're giving an impression that memcpy happens all over the place, where in reality it's quite rare, and certainly doesn't happen in the simple cases you describe.

In Rust, knowledge of ownership and the zoo of strings is a requirement (e.g. use of &String is a novice error). It's nice that Mojo can hide it, and you could celebrate that without making dubious performance claims.

> True if you want it to be immutable, but this actually adds to my point.

Sigh, it adds to inaccuracies. Mutable strings are &mut String, passed as a single pointer, so a mutable string is an even better case of a thin reference that doesn't need memcpy.

> those values need to be available to the end of scope to satisfy Rust's guarantees

No they don't. You're conflating Rust's guaranteed Drop order (which does interfere with TCO) with borrow checking and stack usage, which don't. For references and Copy types, Rust has an "eager drop" behavior. Their existence on the stack is not guaranteed nor necessary.

Borrow checking scopes are hypothetical for the sake of the check, and don't influence code generation in any way. You can literally remove borrow checker and lifetimes from Rust entirely, and the code will compile to the same instructions — mrustc implementation is a real example of that.

Your example function where you try to demonstrate how the arguments prevent TCO compiles to a single `ret`.

> it should be instant and faster than Mojo

The `factorial()` is instant, but the `black_box` isn't because Rust/Criterion implements it differently than Mojo. So Mojo has a faster `benchmark.keep` function, and you failed to benchmark the relevant function, and presented a misleading benchmark with a wrong conclusion.

You should validate your claims on what Rust does by actually checking the output. Try https://rust.godbolt.org/ (don't forget to add -O to flags!) or using cargo-show-asm.

jack_clayto on Feb 15, 2024 | | | [–]

Hi thanks for the discussion, I tried out a bunch of different benchmarks, and Rust TCO is actually working as you say it does. I removed that part from the blog. Thanks very much for the discussion, I definitely need to upskill on assembly.

jack_clayto on Feb 14, 2024 | | | | [–]

Here's at least three Rust veterans in this thread explaining that move is just a memcpy which can be optimized away: https://users.rust-lang.org/t/move-semantics-rust-vs-c/61274...

I removed criterion::black_black from Rust and it had no performance difference, so updated the blog. If I removed benchmark.keep from Mojo it ran in less than a picosecond, so I left it in to be fair.

Can you show me a benchmark to dispute my claim about recursion? I'm not sure what the generated assembly of a function that's not being run is meant to prove.

pornel on Feb 12, 2024 | parent | context | | [–] | on: A look at the Mojo language for bioinformatics

This can happen in languages that use dynamic constructs that can't be optimized out. For example, there was a PHP-to-native compiler (HipHop/HPHPc) that lost to faster interpreters and JIT.

Apple's Rosetta 2 translates x86-64 to aarch64 that runs surprisingly fast, despite being mostly a straightforward translation of instructions, rather than something clever like a recompiling optimizing JIT.

And the plain old C is relatively fast without optimizations, because it doesn't rely on abstraction layers being optimized out.

pornel on Feb 12, 2024 | parent | context | | [–] | on: The Value of Open Source Software

I don't think you can generalize it like that. It really depends whether the complexity is necessary or not, and that's context-dependent.

If the implementation is too simple, it may have issues that impact its reliability. For example, code may be using simple linear search instead of fancier data structures/algorithms, and have issues with handling larger inputs or have accidentally-quadratic code paths. Or may updating data files without the complexity of atomic writes or inter-process locking, and corrupt its data if it's unexpectedly killed/restarted.

Implementations that are too simple may be less secure, e.g. lack features for precise access controls, or sandboxing. Simple parsers can become a vulnerability (e.g. request smuggling attacks from underestimating complexity of edge cases in HTTP and URLs).

Too-simple tools may just push complexity and security risks elsewhere. There's an endless supply of simple templating engines that make it too easy to introduce XSS vulnerabilities.

pornel on Feb 10, 2024 | parent | context | | [–] | on: Short history of all Windows UI frameworks and lib...

For someone who’s not a Windows developer this situation is incredibly confusing. I have no idea what is the correct way to make a modern Windows application.

It seems like everything made after 1995 is a deprecated abandonware, and whatever UI they’re building next is going to be dead before they finish implementing it.

wslh on Feb 10, 2024 | | [–]

Genuinely asking: nowadays, are there desktop UI application beyond the well known ones? Many of the new popular ones (e.g. Slack) basically have an HTML renderer.

adamzochowski on Feb 11, 2024 | | | [–]

The problem is twofold:

App developers want consistent look of their application across various OSes, be it Mac/Win/Linux. This means developing with gtk for linux and winUI for windows is out of the window. Why do app developers want consistent look? because alternatively they would have to create documentation and screenshots for each os separately. And users wouldn't be able to help each other because Mac user would know different interface/look from Linux user.

OS developers, Apple/Microsoft don't have singular look. Microsoft didn't rewrite everything into WinUI settings , and anytime anyone needs to do anything serious they have to go through the win32 settings panels. This lack of OS coherency means that app developers don't feel need to stay consistent with OS, since OS itself is not uniform.

darklion on Feb 11, 2024 | | | [–]

> App developers want consistent look of their application across various OSes, be it Mac/Win/Linux.

No, app developers want to write their code once. A consistent look is just a byproduct of that desire.

I’m also glad you didn’t mention anything about the customer, because ostensibly the app developer is doing this so they can deliver customer value faster, but at the cost of having an application that doesn’t fit into the native operating system. “Consistent look of their application across various OSes” usually means “invent your own custom look and behavior that isn’t standard for any OS”, which means customers have to spend extra time and effort to learn your bespoke UI customs rather than just use the ones they already know from their OS (gives the finger to Visual Studio Code).

The end result is that nobody gets to have an application that looks and feels like their chosen OS, and nobody gets to have a cross-platform application that looks and feels like other cross-platform applications unless they’re from the same vendor (and even then, it can be iffy).

> And users wouldn't be able to help each other because Mac user would know different interface/look from Linux user.

Not really. Customers can handle reasonable variations between software running on different operating systems. They do it today, for applications that work across different operating systems. Even when such an application takes most of its cues from another OS, there’s at least some deference to the native platform design in terms of menu design or control behavior.

What it would mean for the app developer is that they actually have to understand the design systems for the different OSes in order to build a design that can reasonably adapt to Mac/Linux/Windows. That’s why the “one design fits all” approach is so appealing when it comes to look and feel. You don’t have to understand anything. You don’t have to care about the native platform at all. There’s no extra work involved. You do whatever you want, the way you want it, and then throw it over the wall at the customer to deal with.

pornel on Feb 11, 2024 | | | | [–]

Agreed, especially about OSes not being consistent themselves. Even Apple dropped the ball with macOS being a mix of old Cocoa, SwiftUI-isms, and iPadOS transplants.

I think third there's also a third reason: flat design. It used to be nearly impossible to make custom UIs look as polished as the native ones with all their complex gradients and pixel-perfect bevelled edges. Now developers can use anything that can render a rectangle.

Lazonedo on Feb 10, 2024 | | | [–]

Microsoft doesn't even seem to care to dogfood outside of using those things for the most basic apps bundled with Windows and those basic apps really show why nobody should use those toolkits.

I have a laptop with a Ryzen 7 4800HS. It's a few years old but it has no business feeling slow. Yet opening the new winUI explorer or notepad shows visible lag in rendering the title/tab bar.

Native gui toolkits with no benefit over using electron, if anything I've seen plenty of electron apps that were more snappy than these. WinUI 3 is a dumpster fire.

qingcharles on Feb 10, 2024 | | | [–]

I write little desktop apps pretty much every day on Windows to automate some little development task.

I just select WinForms in Visual Studio and use that. Very simple. Can make a pretty complicated GUI app in about 5 mins.

pornel on Feb 10, 2024 | parent | context | | [–] | on: Mixed Reality gone in Windows 11 Insider Preview B...

WMR itself was annoying and pointless, but it’s required by HP Reverb G2, which is still technically a very good headset.

criddell on Feb 10, 2024 | | [–]

I never used WMR, but I never understood why it had to be part of the operating system. Why load that onto millions of computers when so few people use it?

IIRC, it was one of the things that Microsoft made difficult to remove. You could uninstall it through some PowerShell incantation, but chances are it would be back in a couple of Tuesdays.

Levitating on Feb 10, 2024 | | | [–]

I owned a Lenovo headset myself. Imported from the UK for about 200 euros. By far the best price-value at the time.

The only major issue I had was of course WMR itself. Why couldn't they just make it behave normally.

steelframe on Feb 10, 2024 | | | [–]

I own two HP Reverb G2s and am not at all happy about this. I shouldn't be forced to choose between being able to use my hardware and getting security updates. This behavior on the part of Microsoft should be illegal.

pornel on Feb 10, 2024 | parent | context | | [–] | on: Too dangerous for C++

Rust's type system for thread safety is actually remarkably simple. Types declare whether they add or remove thread safety (e.g. Mutex adds safety, non-atomic Rc removes). Structs automatically become non-thread-safe if they have non-thread-safe fields. Then all the functions that spawn threads or send data over channels require thread-safe types.

The fearless concurrency is real. It reliably prevents data races, use-after-free, and moving of thread-specific data to another thread. It works across arbitrarily large and complex call graphs, including 3rd party dependencies and dynamic callbacks. Plus immutability is strongly enforced, and global mutable state without synchronization is not allowed.

It doesn't prevent deadlocks, but compared to data corruption heisenbugs, these are pretty easy — attach a debugger and you can see exactly what deadlocked where.

bsdpufferfish on Feb 10, 2024 | | [–]

> global mutable state without synchronization is not allowed.

That's the java model again. I don't want fearless concurrency, I want intentionally designed threads.

estebank on Feb 10, 2024 | | | [–]

It's not the Java model because Java just makes every operation atomic (from the point of view of the model, I'm sure the JVM and javac must do optimizations to avoid some of them), while Rust enforces that multi threaded access must be through atomic operations, but if there is no multi threaded access or the type is not meant to be used in a multi threaded context, that information is encoded in the type system. This might sound like an academic distinction, but it is different: the developer is in control. You could even go as far as lie to the type system and claim a racy type is actually thread safe. I wouldn't advice doing so, but I can't stop you.

bsdpufferfish on Feb 10, 2024 | | | [–]

I’m sure the rust version is more ergonomic. It’s great you can do it more safetly. But it’s a bad application design from the start.

pornel on Feb 14, 2024 | | | [–]

That's an incredibly broad criticism aimed at some hypothetical solutions you imagine, not grounded in what Rust does. No language can stop an imaginary infinitely determined fool.

Rust's restrictions, such as strict scopes of references and strongly enforced shared XOR mutable access, prevent many sloppy and careless designs that are possible in Java or C++.

Rust also takes advantage of its type system, generics, and ecosystem to offer solid constructs for multi-threading. There are safe data parallelism libraries, task queues, thread pools, scoped threads, channels, etc. Users are well equipped to implement multi-threading properly, and as much as possible Rust steers users towards locally-scoped, immutable or share-nothing solutions.

pornel on Feb 9, 2024 | parent | context | | [–] | on: Rust wont save us, but its ideas will

I think this is very unlikely to happen.

The main value of C and C++ is in their backward compatibility. Many projects don't even use current versions the languages, and stick to decades-old subsets instead.

There are safety improvements that C++ could do right now, but is unwilling to do. For example, `[]` on vectors could have bounds checks. `->` on empty `std::optional` could trap instead of being UB. Rust accepts performance cost of these things, C++ doesn't want to.

C and C++ move too slowly and are too fragmented and ossified to make a difference in the foreseeable future. They couldn't copy even the easy stuff like Rust's painless dependency management (C++ modules are not going well). Even if a miracle-safety-feature was invented tomorrow, it will take many many years before it goes through cycles of standardization, implementation across vendors, and new compilers being available as a baseline.

Safety improvements won't be a magic compiler flag — C has already deeply invested in analyzers, sanitizers, and fuzzers to improve safety without rewrites. Rust's improvements come from a richer but stricter type system, and that requires refactorings. There are safer C dialects already that have these features, but nobody wants to add lifetime annotations and rewrite their pointer arithmetic to use slices. "Nobody's going to rewrite millions lines of code" applies the same to Safe-C.

But the more Rust-like features they adopt, the easier it will be to automatically convert them to Rust.

pornel on Feb 9, 2024 | parent | context | | [–] | on: Porting Libyaml to Safe Rust: Some Thoughts

Rust uses "unsafe"/"safe Rust" as its jargon term that has a specific narrow definition in Rust's context.

pornel on Feb 8, 2024 | parent | context | | [–] | on: Is Apple breaking PWAs out of malicious compliance...

Ugh, I'm worried that this could get ugly on social media, and it's just shooting the messenger.

Apple is very controlling about what the employees are allowed to say, and this is a cross between comments on future products/features and ongoing legal issues — things Apple really doesn't like commenting on.

kyleee on Feb 8, 2024 | | [–]

Sort of shooting the messenger, but then again she is paid what, $3 - $500,000 TC to take those bullets… or maybe more? So that should soften the blow. I do prefer people be respectful and calm in voicing their distaste though

sccxy on Feb 8, 2024 | | | [–]

They have release notes, but they hide their changes.

Just write some words to explain. It is not a rocket science.

sccxy on Feb 8, 2024 | | | [–]

That means it is intentional and not a "bug".

It is 6 days since first reports. PR team should already given some answers.

AshleysBrain on Feb 9, 2024 | | | [–]

Nope, it could still be a bug - they typically remain silent even when major bugs are causing disruption. Just one of several ways Apple make developing for Safari a nightmare.

turquoisevar on Feb 9, 2024 | | | | [–]

Apple never comments on beta versions. Whether it be a bug, a removed function or an introduced function.

As far as Apple is concerned a beta is a beta and all beta are off. Only once it gets a stable release will they start commenting.

sccxy on Feb 9, 2024 | | | [–]

They have release notes for beta versions. Where they comment known bugs & changes. But yeah it is Apple.

turquoisevar on Feb 9, 2024 | | | [–]

Fair, you got me there.

I meant more so regarding future plans and other than release notes.

But fair is fair, they make comments in their release notes.

pornel on Feb 7, 2024 | parent | context | | [–] | on: Two's complement integers with only sign bit set s...

abs() of this value also happens to exceed the maximum positive value. Which means that in C, a simple negation can cause signed integer overflow -(-32768), and therefore UB.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact