Yep, I saw that go by, and a lot of my work is heavily influenced by Andrew Lines. But what I've been seeing is that QDI is really bad at arithmetic because the acknowledgement requirements turn XORs into a nightmare hairball of signal dependencies. But QDI is really good at complex control.
That's not true for all ways of doing things, for example, with bundled data, dual rail domino QDI and various commercial groups like wave computing and ETA computing which have their own asynchronous flavors, often optimized for arithmetic operations.
I was specifically talking about dual rail domino QDI. When you compare the typical dual rail domino QDI adder found in Andrew Lines thesis against a typical clocked carry lookahead adder like Kogge & Stone, it is worse by factors of between 2 and 3 in energy, area, and throughput.
Bundled data is a simple control with data clocked from that control. Its very much keeping arithmetic away from the QDI circuitry.
Though to be fair, I haven't seen a good examination of how pass transistor logic might affect QDI arithmetic circuitry, so maybe there is hope.