Why not just always spin until a fixed number of ticks (or microseconds for slewing clocks) have passed (starting from function entry), prior to returning?
Obviously this doesn't mitigate power usage side channel attacks, but that's not the point here.
You could totally do that, but in the exact same way as the above, you'd want the compiler not to optimize your spinlock away thinking it wasn't needed. My understanding is in lots of real applications, the asm code that I mentioned is in part making sure it waits specific numbers of clocks in each branch to ensure they all exactly balance.
Obviously this doesn't mitigate power usage side channel attacks, but that's not the point here.
It's time-bound, so let's check time.