Yeah I think this is the biggest issue. Even if you know precisely the configura...

Yeah I think this is the biggest issue. Even if you know precisely the configuration of the target computer, you can’t know if there’s going to be a cache miss. A conventional CPU can reorder instructions at runtime to keep the pipeline full, but a VLIW chip can’t do this.

In reality of course you don’t even know the precise configuration of the computer, and you don’t know the exact usage pattern of the software. Even if you do profile guided optimization, someone could use the software with different data that causes different branch patterns than in the profile, and then it runs slow. A branch predictor will notice this at runtime and compensate automatically.