If there's a list of known optimizations that preserve correctness then it becomes an optimization problem based on output length (as a proxy for cycle count). So is the idea that an LLM is more efficient than a search or direct optimization?
There tend to be a lot of heuristics involved when deciding which optimizations to apply and in which order, so there's plenty of room to apply some machine learning.
Whether LLMs are the right approach is a separate question.
In SQL optimization, the problem is a bit trickier (IMO) because compilation is in the query path. One successful approach I know of is Bao: https://arxiv.org/abs/2004.03814
This is scooching into AlphaGo (or whatever the generic implementation is called) territory: mixing predictive AI with traditional optimization algorithms.
For function inlining specifically, I'm not sure LLMs are necessarily the right choice. The original MLGO paper [1] demonstrated a big code-size improvement with a ML model for making inlining decisions (7-20% code size wins), but they used tens of engineered features. Maybe a LLM could squeeze some additional size wins out, but maybe not [2].
Additionally, there are other factors to consider when productionizing these systems. Compile time is important (which LLMs will almost certainly explode), and anyone concerned about code size will probably be doing (Thin)LTO which would require feeding a lot more context into a LLM making inlining decisions.
I'm probably being very naive with this but, could mechanistic interpretability play a role here? Specifically, to experimentally short-list the LLM's most effective optimizations, then try to peek inside the LLM and perhaps fish out some novel optimization algorithm that could be efficiently implemented without the LLM?
I did a bit of work on this last summer on (much) smaller models [1] and it was briefly discussed towards the end of last year's MLGO panel [2]. For heuristic replacements specifically, you might be able to glean some things (or just use interpretable models like decision trees), but something like a neural network works fundamentally differently than the existing heuristics, so you probably wouldn't see most of the performance gains. For just tuning heuristics, the usual practice is to make most of the parameters configurable and then use something like bayesian optimization to try and find an optimal set, and this is sometimes done as a baseline in pieces of ML-in-compiler research.