I was really struck by a comment Jonathan Blow made on stream recently: he said he’s never written a parallel for loop in his whole career. I seem to recall the implication being that they’re often not really necessary for performant code. There’s also been some discussion lately about issues with asynchronous code both in Rust and Python. Point being that parallelism still had a ways to go before it’s proven it’s usefulness. However, I agree with you that it would be nice to see more language tootling to make it simpler since i work on some bits of code that I think could benefit from parallelization but the amount of work I’d have to put in mean it’s a very low priority given the savings.
I never wrote a parallel for-loop in 15 years working on Firefox, because it's hard in C++, it's risky and difficult to maintain the thread-safety invariants, and it's not all that useful in most parts of the browser.
I write them quite often in Rust, because Rayon makes it super easy, there is almost no risk because the compiler checks the relevant thread-safety invariants, and I'm working on different problems where data parallelism is much more useful.
I've used them extensively in C++. Doing it manually by managing your own threads is a pain, but simple OpenMP based parallel loops work really well, and also supports tasks like building vectors and simple reductions.
When your loop body uses complex library APIs over complex data it's still hard to be confident in C++ that everything's threadsafe and you're avoiding data races.
Maybe it's not so hard if you're in a domain like HPC where the libraries you use are designed specifically to be used with data parallelism. But when you're pulling together code from different sources that may or may not have been used in an aggressively parallel application before...
I think it's less about libraries and more about the general approach to programming.
In the HPC world, software is usually doing one thing at a time. Most of the time it's either single-threaded, or there are multiple threads doing the same thing for independent chunks of data. There may be shared immutable data and private mutable data but very little shared mutable data. You avoid situations where the behavior of a thread depends on what the other threads are doing. Ideally, there is a single critical section doing simple things in a single place, which should make thread-safety immediately obvious.
You try to avoid being clever. You avoid complex control flows. You avoid the weird middle ground where things are not obviously thread-safe and not obviously unsafe. If you are unsure about an external library, you spend more time familiarizing yourself with it or you only use it in single-threaded contexts. Or you throw it away and reinvent the wheel.
If the APIs that you're interacting with are side-effect free then it's easy. If they are full of side effects, then they aren't written with multithreading in mind and you wouldn't be able to even compile it in Rust. C++ just takes off the training wheels.
It's a bit more complicated than that, because code can be thread-safe but not side-effect-free, but basically you're just restating what I said. C++ makes it hard to be sure code is really safe to use across threads, which means in practice developers should be more reluctant to do so.
The world is full of highly parallel programs getting useful work done. Most graphics, AI and compression libraries (picking 3 easy examples I've worked on) parallelize well, and can usually make use of all the cores you can throw at them.
Jonathan Blow makes good games, but chooses not to make particularly CPU intensive ones. That's fine, but that's also his choice.
He's also currently building one of the fastest compilers around. It's unreasonable to consider that he never encountered use cases where parallelism makes sense.
Indeed, he wasn’t saying parallelism is not useful, just that the specific construct of a parallel for loop was not in his wheelhouse for certain reasons.
My impression of Jon's work is that he requires low enough level access to his hardware so he's the one that makes decisions about where and what runs. Language level parallel for is definitely not that. :D
Parallelism and asynchronous code are not the same, and in the case of Rust they are very much not the same. Parallel for provides massive advantages for many things including game programming (from experience) so with all due respect I think this says more about Jonathan Blow than it does anything about "parallelism still needing to prove itself."
I wrote a parallel iteration (map-reduce) last week in some CPU-heavy code, took 5 minutes with Rayon. Sped my code up by around 10x on a 12-core machine, example benchmark going from 7 seconds to 700 milliseconds. It's serious business.