> I've extracted a small subset of the data to graph in Excel
Thanks. If you still have the data (or if you can regenerate it) it would actually be possible to make a few small graphs that would cover the whole set:
Basically, the idea is to have x points as much as there are different exponents, which in 32-bit floats is at most 256. Then on the y axis one draws the number of bits of maximum distance between the really correct value of the mantissa and the calculated mantissa in the whole interval of one exponent. Those would allow comparing the implementations of Intel and AMD separately, and these graphs are what I'd be very interested to see. So the idea is to find the maximums in an interval, and there are then a limited number of the intervals. Only if such graphs match between AMD and Intel it would be interesting to compare inside of the intervals, but differences on that level would be these that I'd expect would be the obvious ones making the most problems, where the results like in the article (full black instead of shadow) wouldn't be surprising.
For that don't have to juggle with huge files, it's only 256 values per each CPU in that pass that are to be compared.
Can re-generate the data files, it’s couple pages of code to copy-paste from my gist and compile, but I’m not sure how to implement what you wrote.
The results are not guaranteed to have same exponent, e.g. 1 / 0.499999 can be either 1.999999 or 2.0000001, both are correct within the precision, but have different exponents.
1) Use the binary representation of the numbers! To do so, cast the resulting float to the unsigned integer, then use bit masks and shifts to extract the exponent and mantissa. Note that the leading 1 is not explicit but implicit in the IEEE format unless it's a denormal number (so make it always explicit during the extraction).
2) use the exponent of the correct result as the "interval" reference.
If the exponents are the same, do a subtraction of the smaller from the bigger mantissa, that's the absolute "distance" between the two numbers -- the goal is to find what is the biggest absolute distance in which interval.
3) If one of the 2 values that are compared has different exponent, they can be converted to the same by a bit shift. Shift the mantissa of the one with the bigger exponent left accordingly. Again do the subtraction and use the result as the absolute distance. The goal is to figure out the biggest absolute distance in each interval (maintaining a maximum for each interval).
In short, think binary, not decimal, and measure using these values. Binary are only values that matter, decimal representation doesn't necessarily represent the exact values of bits.
Examples:
float 1.0 == unsigned 0x3f800000 here exponent is 127 == 2^0 and mantissa 0 with implicit 1 at the start i.e. explicit: 0x800000
float 0.999999940395355224609375 = unsigned 0x3f7fffff here exponent is 126 == 2^-1 and mantissa explicit: 0xffffff
The absolute distance between these two numbers is 1 (adding one to the lowest bit of mantissa of the smaller number would result the higher number 0xffffff + 1 = 0x1000000, the later is the mantissa adjusted to the same exponent of the smaller number (0x800000 << 1) ). If the "correct" number was 0x3f800000 and even if the shift was needed to calculate the absolute distance, the interval is still 0 (i.e. 0 is the x axis value, as its exponent was 2^0 i.e. 127, and the value to be plotted is on y is 1 until a bigger distance occurs).
For more examples of the format you can play here:
Also note that a few exponents are special, meaning infinity or NaN. Whenever the "correct" answer is not a NaN or infinity but the "incorrect" is, that should be treated specially, if it actually happens.
Total, exact, less, greater columns have total count of floats in a bucket. Sum of the “Total” gives 2^32, the total count of unique floats.
Computing max bit that’s different is too slow for the use case, neither SSE nor AVX have vector version of BSR instruction. Instead, I’m re-interpreting floats as integers and computing difference of the integers. maxLess, maxGreater, and maxAbs columns have that maximum error, measured as count of float values of the error. The value 4989 means the mantissa had like 12-13 lowest bits incorrect.
Source code is there: https://github.com/Const-me/SimdPrecisionTest/blob/master/rc...
Not particularly readable because I’ve used AVX2 and OpenMP, however this way it takes less than a second on desktop, and maybe 1.5 seconds on a laptop to process all of these floats.
Based on how I understand the numbers in the tables, it looks to me like both implementations behave the same in the critical points, and AMD obviously achieves less distance from the "exact", but has some kind of truncating instead of rounding logic which changes the distribution of the approximations, and you discovered that with counting "less" and "greater." Congratulations!
Thanks. If you still have the data (or if you can regenerate it) it would actually be possible to make a few small graphs that would cover the whole set:
Basically, the idea is to have x points as much as there are different exponents, which in 32-bit floats is at most 256. Then on the y axis one draws the number of bits of maximum distance between the really correct value of the mantissa and the calculated mantissa in the whole interval of one exponent. Those would allow comparing the implementations of Intel and AMD separately, and these graphs are what I'd be very interested to see. So the idea is to find the maximums in an interval, and there are then a limited number of the intervals. Only if such graphs match between AMD and Intel it would be interesting to compare inside of the intervals, but differences on that level would be these that I'd expect would be the obvious ones making the most problems, where the results like in the article (full black instead of shadow) wouldn't be surprising.
For that don't have to juggle with huge files, it's only 256 values per each CPU in that pass that are to be compared.