How does this compare with a generic tool like Antithesis? I recognize closed source money vs open source free but from a feature perspective would Antithesis be more effective at finding the issues since it’s not limited to stuff happening in the JVM / can test concurrency of more complicated network topologies between components?
AFAIK, Antithesis uses a hypervisor to achieve deterministic execution. This can be less effective because the hypervisor does not have language semantics and faces a larger search space. You may check Figures 5 and 6 in our technical report[1], where we compare Fray against RR, a record and replay tool that can also be used for concurrency testing at OS level[2].
Antithesis is supposed to be quite a bit faster than rr chaos precisely because it’s a hypervisor vs rr which is trying to intercept syscalls which is notoriously slow, so comparing against rr per second feels like a bad proxy.
Unlike rr chaos, which I believe uses a random search without any knowledge of past runs, Antithesis is supposed to do a more targetted search through the orderings with understanding of history between runs, so the executions needed per bug similarly has rr as a bad proxy.
I’m also not sure I see how language semantics can be exploited when you’re interleaving based on different thread orderings. If I understand it correctly, Fray is also slightly more limited than something like Antithesis which can also test I/O failures and different I/O orderings in a distributed setting as well.