Ye, hence a separate version for CPUs which don't have that problem. Although, maintaining so many of these RSA kernels does seem like a pain. Didn't realize u wrote that code; super cool that it's used in practice!
I am not the original author—this is adapted from an implementation by Shay Gueron, the author of that paper I linked, but I do agree that it's cool!