Out-of-order architectures are inhumanly complex, especially figuring out the dependencies. For example, can we reorder these two instructions or must execute them sequentially?
ld r1, [r2 + 10]
st [r3 + 4], r4
And then consider things like speculative execution.
Honestly to me it seams like optimizing compilers and out-of order CPUs are actually doing the same thing. Can't we get rid of one or the other?
Either have a stupid ISA and do all the work ahead-of-time with way more compute time to optimize or don't optimize and have a higher level ISA, that also hs concepts like pointer provenance.
The current state seams like a local minima with both having ahead-of-time optimization, but the ISA does it's thing anyways and also the compiler throwing much of the information away with OoO analysis being time-critical.
The compiler doesn't know the dynamic state of the CPU memory hierarchy and you don't want it to. Even the CPU doesn't know until it finds out how long a load will take.
Meanwhile the CPU probably can't do a loop invariant hoist in a reasonable way or understand high level semantics.