> A clockless pipeline is always going to be slower than a clocked one
How come? In a clocked design, you have to have clocks slow enough so all possible logic paths would finish. In a clockless one the propagation only takes as much as needed, and in a case of shorter path can take less time, doesn't it?
Synchronous design tools are very good at making all of the pipeline stages have about the same logic depth, which is generally 6-8 transitions/cycle but can be much less. The fastest possible QDI circuit is a very simple, very small WCHB buffer which has 6 transitions/cycle. Most QDI logic will have 10-14 transitions/cycle.
Also, the speed of a linear pipeline is limited to the slowest stage in the pipeline whether or not you use clockless. Clockless only helps pipeline speed when you have a complex network.
I don't think it's really fair to condemn all of asynchronous due to the slowness of QDI. There are faster ways of doing things like GaSP, dual rail domino done detection, bundled data, one sided handshaking, etc.
You're right, and I don't intend to condemn all of async, or even QDI for that matter :) I am doing my PhD on it, so I do think there is promise. I just think that arithmetic is better handled by Bundled-data specifically. Let QDI do the control leg-work and tack high-performance arithmetic to it.
Also, Gasp is certainly faster, but is limited to simple pipelines. That's why I like QDI, it lets me make weird circuits.
EDIT: Sorry, I got mixed up between the conversation threads... dislexia is a thing.
I'm not saying condemn async or QDI, but we must recognize what it is good at and what it is not. A QDI pipeline stage may be slower, yes. So don't use it if you just want to implement a linear pipeline. But do use it if you have a complex network because of the previously mentioned benefits. Gasp and other async pipeline topologies don't have the flexibility of QDI, and there isn't really a good framework to mix them with QDI techniques at the moment (maybe relative timing?). The power of async comes from this flexibility and the ability to avoid unnecessary computation.
How come? In a clocked design, you have to have clocks slow enough so all possible logic paths would finish. In a clockless one the propagation only takes as much as needed, and in a case of shorter path can take less time, doesn't it?