There are a bunch of real situations where you can assume the input will be in a...

pclmulqdq · 2025-04-18T23:07:54 1745017674

Most real math libraries will do this with only a quarter of the period, accounting for both sine and cosine in the same numerical approximation. You can then do range reduction into the region [0, pi/2) and run your approximation, flipping the X or Y axis as appropriate for either sine or cosine. This can be done branchlessly and in a SIMD-friendly way, and is far better than using a higher-order approximation to cover a larger region.

kllrnohj · 2025-04-18T18:51:40 1745002300

> branching on inputs outside of the target expected range is a fine strategy realistically

branches at this scale are actually significant, and so will drastically impact being able to achieve 40x faster as claimed

dzaima · 2025-04-18T18:56:03 1745002563

That's only if they're unpredictable; sure, perhaps on some workload it'll be unpredictable whether the input to sin/cos is grater than 2*pi, but I'm pretty sure on most it'll be nearly-always a "no". Perhaps not an optimization to take in general, but if you've got a workload where you're fine with 0.5% error, you can also spend a couple seconds thinking about what range of inputs to handle in the fast path. (hence "target expected range" - unexpected inputs getting unexpected branches isn't gonna slow down things if you've calibrated your expectations correctly; edited my comment slightly to make it clearer that that's about being out of the expanded range, not just [-pi/2,pi/2])

kllrnohj · 2025-04-18T19:03:54 1745003034

Assuming an even distribution over a single iteration of sin, that is [0,pi], this will have a ~30% misprediction rate. That's not rare.

dzaima · 2025-04-18T19:09:18 1745003358

I'm of course not suggesting branching in cases where you expect a 30% misprediction rate. You'd do branchless reduction from [-2*pi;2*pi] or whatever you expect to be frequent, and branch on inputs with magnitude greater than 2*pi if you want to be extra sure you don't get wrong results if usage changes.

Again, we're in a situation where we know we can tolerate a 0.5% error, we can spare a bit of time to think about what range needs to be handled fast or supported at all.

kllrnohj · 2025-04-18T19:17:47 1745003867

Those reductions need to be part of the function being benchmarked, though. Assuming a range limitation of [-pi,pi] even would be reasonable, there's certainly cases where you don't need multiple revolutions around a circle. But this can't even do that, so it's simply not a substitute for sin, and claiming 40x faster is a sham

dzaima · 2025-04-18T19:27:52 1745004472

Right; the range reduction from [-pi;pi] would be like 5 instrs ("x -= copysign(pi/2 & (abs(x)>pi/2), x)" or so), ~2 cycles throughput-wise or so, I think; that's slightly more significant than I was imagining, hmm.

It's indeed not a substitute for sin in general, but it could be in some use-cases, and for those it could really be 40x faster (say, cases where you're already externally doing range reduction because it's necessary for some other reason (in general you don't want your angles infinitely accumulating scale)).