I’m not sure this is fair - a lot of the performance optimisations have come from applied mathematicians rather than programmers, and python is not generally considered to be the bottleneck (it is the interface rather than what is running the computation - it calls a C API which then often uses CUDA and may also run on hardware specifically designed for ML).