AFAIK, Antithesis uses a hypervisor to achieve deterministic execution. This can be less effective because the hypervisor does not have language semantics and faces a larger search space. You may check Figures 5 and 6 in our technical report[1], where we compare Fray against RR, a record and replay tool that can also be used for concurrency testing at OS level[2].
Antithesis is supposed to be quite a bit faster than rr chaos precisely because it’s a hypervisor vs rr which is trying to intercept syscalls which is notoriously slow, so comparing against rr per second feels like a bad proxy.
Unlike rr chaos, which I believe uses a random search without any knowledge of past runs, Antithesis is supposed to do a more targetted search through the orderings with understanding of history between runs, so the executions needed per bug similarly has rr as a bad proxy.
I’m also not sure I see how language semantics can be exploited when you’re interleaving based on different thread orderings. If I understand it correctly, Fray is also slightly more limited than something like Antithesis which can also test I/O failures and different I/O orderings in a distributed setting as well.
[1]: https://arxiv.org/pdf/2501.12618
[2]: https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo...