I'm reminded of a guy tasked with testing a 32 bit floating point library. After a number of false starts he realized the most effective way possible.
Brute force.
Oh other story. An OG (original geek) I know once proved the floating point unit on a mainframe was broken via brute force. Bad part meant the decimal floats used by and only by the accounting department sometimes produced erroneous results.
Me I have a pitiful amount of unit tests for the main codebase I work on. One module which does have unit tests, also had a bug that resulted in us having to replace a couple of thousand units in the field.
Otherwise most of the bugs that chap my keister aren't about procedural correctness, they're about state.
How do you test it by brute force? How is the algorithm supposed to know what the right thing to return is, unless you rewrite the whole thing, correctly this time? And, if you've done that, just replace the actual function with the test and be done with it.
It's a good idea in principle, though your third and fourth properties are not true of floating point addition.
When you have one available, testing against the results against a reference implementation is a little more direct. Though, given how easy those property tests are, you might as well do both.
Your comparison source could be running on a different platform, or be slower, or ... and thus not be a drop-in replacement. (E.g. in the example of a broken floating-point unit, compare the results with a software emulation)
Hmm, true, it may have been a different platform/slower, thanks. It can't have been hardware/software, as the GP said they were testing a library (which implies software).
How is the algorithm supposed to know what the right thing to return is
Some times I have, for example, matlab code that I 'know' is right and want to verify that my C/Python/Julia... code does the same thing. So then I just call the matlab code, call the new code and check if they're the same.
Or I have a slow exact method and want to see if my faster approximation always gives an answer 'close enough' to the correct answer.
With floating-point math libraries especially, it's frequently the case that it's relatively easy to compute the desired result accurately by using higher precision for internal computations; e.g. you might compare against the same function evaluated in double and then rounded to float.
This sounds pretty sketchy at first, but it turns out that almost all of the difficulty (and hence almost all of the bugs) are in getting the last few bits right, and the last few bits of a much-higher precision result don't matter when you're testing a lower-precision implementation, so combined with some basic sanity checks this is actually a very reasonable testing strategy (and widely used in industry).
… this reply does nothing to answer the parent's question. It's obvious that you can run it on all possible values, but that doesn't answer the question of the return values. (unless you only want to verify "doesn't crash" or something similarly basic)
I believe the question was edited after I wrote my answer. This, or I just misread it - unfortunately HN doesn't let us know. Anyway, I'm providing another answer to the question as it is now.
Brute force.
Oh other story. An OG (original geek) I know once proved the floating point unit on a mainframe was broken via brute force. Bad part meant the decimal floats used by and only by the accounting department sometimes produced erroneous results.
Me I have a pitiful amount of unit tests for the main codebase I work on. One module which does have unit tests, also had a bug that resulted in us having to replace a couple of thousand units in the field.
Otherwise most of the bugs that chap my keister aren't about procedural correctness, they're about state.