Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It may be useful to mention that modern architectures often have vectorized instructions like vrsqrt14ps (accessible via the _mm512_rsqrt14_ps intrinsic) that provides a 14-bit approximation (there are more accurate variants too) in every lane with an inverse throughput of 2. These are faster than the integer bit hacks.

https://software.intel.com/sites/landingpage/IntrinsicsGuide...



AMD Vega has "V_RSQ_F32" (Reciprocal Square Root"), and NVidia has rsqrt.approx.f32. ARM has vrecpsq_f32.

So all the major platforms, CPUs and GPUs, implement the fast reciprocal square root to decent amounts of accuracy, without any need of bit-twiddling anymore.


Though worth noting only for specific AVX-512 CPUs.

Though you can still have vectorised support for regular precision for AVX2.


_mm256_rsqrt_ps is 12-bit accuracy in AVX.

https://software.intel.com/sites/landingpage/IntrinsicsGuide...


I imagine the bit hacks are still useful for microcontroller applications though.


I prefer architectures that have a vector instruction for computing one or two Newton iterations instead.

That way you can quickly get the precision that you want ;)


vrsqrtps executes once-per-cycle throughput on Skylake / Icelake.

You can always compute newton's method afterwards to improve the accuracy. But getting the maximum accuracy from one cycle is probably best.


You can create an instruction to perform X newton iterations in 1 cycle if you want, and you can pick X to give you 14-bit precision.

With such an instruction, you could exponentially increase the precision in 2 cycles by just using the same instruction twice.

You can't do that on Intel's hardware. As you mention, you'd need to roll your own multi-SIMD-instruction rsqrt newton iteration complex loop, and use it after the first SIMD call.

That's really sad. There is hardware to perform Newton iterations on Intel CPUs, that's how that instruction is implemented, but the ISA only exposes this hardware via the "do a 14-bit rsqrt operation", which means that you can't really use it to increase precision if that does not suffice for your app.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: