Sub 1-bit has been done at least as far back as 2016 for VGG style networks (my work).
I was able to get 0.68 "effective" bits.
The idea is that in each forward pass you add noise to each weight independently drawn from normal distribution, and when you calculate snr, it's sub 1 bit. Points to the idea that a stochastic memory element can be used.
My 0-bit, no-input model can predict if you have cancer with 99.5% accuracy and 0.5% false negative rate. Don't ask about whether the cancer cell(s) are benign.