It's not about number of available threads, the very act of scheduling tasks across multiple threads has scheduling and communication overheads, and in many situations actually ends up being slower than running it on the same thread.
That said, I think the original comment was rightly pointing out how easy it was to make the change and test it, which in this case did turn out to be noticeably faster.
Parallelization is the nuclear energy of comp science.
Loads of potential, high risk reward and they would have gotten away with it, if it were not for those meddling humans. Its non-trivial and can only be handled by accomplished engineers.
Thus it is not used - or is encapsulated, out of sight, out of reach of meddling hands. (CPU Microcode shovelling non-connected work to pipelines comes to mind / NN-Net training frameworks, etc.)
> [...] or is encapsulated, out of sight, out of reach of meddling hands.
That's the real issue here! Most language have poor abstractions for parallelism and concurrency. (Many languages don't even differentiate between the two.)
Encapsulating and abstracting is how we make things usable.
Eg letting people roll hash tables by themselves every time they want to use one, would lead to people shooting themselves in the foot more often than not. Compared to that, Python's dicts are dead simple to use. Exactly because they move all the fiddly bits out of the reach of meddling hands.
There is a lot of untapped parallelism readily available waiting for the right code.