For the range reduction, I've always been a fan of using revolutions rather than radians as the angle measure as you can just extract fractional bits to range reduce. Note that this is at the cost of a more complicated calculus.
I can't for the life of me find the Sony presentation, but the fastest polynomial calculation is somewhere between Horner's method (which has a huge dependency tree in terms of pipelining) and full polynomial evaluation (which has redundancy in calculation).
Totally with you on not relying on fast math! Not that I had much choice when I was working on games because that decision was made higher up!
I can't for the life of me find the Sony presentation, but the fastest polynomial calculation is somewhere between Horner's method (which has a huge dependency tree in terms of pipelining) and full polynomial evaluation (which has redundancy in calculation).
Totally with you on not relying on fast math! Not that I had much choice when I was working on games because that decision was made higher up!