The conventional x86 way of doing it was to spill to "the stack" and I'm told th...

brigade · on March 23, 2013

The only stack-specific special optimization that's done is fusing the decrement/increment of esp/rsp with the store µop. And that's done mainly since push/pop are one byte opcodes, unlike general load/store.

Everything else is general memory optimizations that apply for everything like the aforementioned store forwarding. It's still expensive if the CPU can't use them (mismatched load/store size, incorrect speculation, etc.)