One interesting detail in the Code As Policies paper from last year was that the...

One interesting detail in the Code As Policies paper from last year was that they generate code from the given API, then recursively get the LLM to implement any functions from what it generates that don't already exist. I thought that was quite neat.

I've started seeing those sort of hallucinated API calls as a signal for something that ought to exist; if it's predictable enough for an LLM to think it might, then maybe it should, to make it easier for humans too?