Wow, thank you for digging that up, I suspected there might be some gain there but from the abstract it looks like I was off by a massive amount in my naive estimate of what might be achievable.
"Theoretical analysis shows that XOR-Net
reduces one-third of the bit-wise operations compared with
traditional binary convolution, and up to 40% of the full-
precision operations compared with XNOR-Net. Experimental
results show that our XOR-Net binary convolution without
scaling factors achieves up to 135× speedup and consumes no
more than 0.8% energy compared with parallel full-precision
convolution."
The big question is whether or not the results are still as good. It would be super interesting to see whether applying this to LLMs would give comparable benefits.
"Theoretical analysis shows that XOR-Net reduces one-third of the bit-wise operations compared with traditional binary convolution, and up to 40% of the full- precision operations compared with XNOR-Net. Experimental results show that our XOR-Net binary convolution without scaling factors achieves up to 135× speedup and consumes no more than 0.8% energy compared with parallel full-precision convolution."
The big question is whether or not the results are still as good. It would be super interesting to see whether applying this to LLMs would give comparable benefits.