Different elements of the output vector may take very different time to compute. If you do parallelization with split_at_mut API, you won’t be able to saturate all cores because the thread who does the splitting can’t possibly know how much time each slice going to take.
Sometimes you don’t want libraries, instead you want to implement similar stuff in your own code. And Rayon uses unsafe to workround the compiler limitation of the safe rust I was talking about.
> Sometimes you don’t want libraries, instead you want to implement similar stuff in your own code.
That's always the tradeoff, isn't it? You can implement the logic yourself, or modify three lines and immediately get parallel evaluation of you loop (add the dependency in Cargo.toml, add an import statement and modify an .iter() call to .par_iter()).
> And Rayon uses unsafe to workround the compiler limitation of the safe rust I was talking about.
So does the standard library. Using unsafe is not a cheat, it's not a defeat. It is letting the library developer express something that the borrow checker cannot yet comprehend, at the cost of the developer taking responsibility of upholding the language's invariants.
Those "safe rust limitations" are the point, not an accident or misfeature. "If we restrict ourselves to handling the 90% most common cases of problems, we can automate the checks and provide an escape hatch for the other 10%" is the unofficial Rust ethos! The alternatives would be to either sacrifice performance in the general case or sacrifice safety in the general case.
Btw, the author of rayon is Niko Matsakis. It's not part of stdlib because of many reasons, but the quality of implementation is not one of them.
The poster has figured out a significant performance leak, their Y-axis is no longer nonsense, and they've got a peak indication so that we know the theoretical best possible numbers (no practical software will get there but indeed OpenMP is closer than Rayon)