One of the fun parts of compiler optimizations is that they can be competing forces, for example inlining for speed and outlining for size. You can see a similar pattern in garbage collectors (throughout vs latency). It can be easy to land in an infinite loop or an oscillating function.
One technique I've seen is to set a goal for one optimization and, once that's met, throw the remaining resources at the other target. That feels like a bit of a cop out, though.
That seems like a good idea. I am puzzled by what benefit the RL has in OP. It seems like a well defined constraint optimisation problem that could be done without RL, for example in the way you mentioned.
One technique I've seen is to set a goal for one optimization and, once that's met, throw the remaining resources at the other target. That feels like a bit of a cop out, though.