I wonder what a learning model with the ability to test feedback from executing ...

I wonder what a learning model with the ability to test feedback from executing its code in an interpreter would look like? I know there are different groups testing things like integrating the ability for LLMs to use tools, wondering how this will pan out in the end.