Anything it has not been trained on. Try getting AI to use OpenAI's responses API. You will have to try very hard to convince it not to use the chat completions API.
yeah once again you need the right context to override what's in the weights. It may not know how to use the responses api, so you need to provide examples in context (or tools to fetch them)
This is just an issue with people who expect AI to solve all of lifes problems before they get out of bed not realising they have no idea how AI works or what it produces and decide "it stops working because it sucks" instead of "it stops working because I don't know what I'm doing"
In my limited experiments with Gemini: it stops working when presented with a program containing fundamental concurrency flaws. Ask it to resolve a race condition or deadlock and it will flail, eventually getting caught in a loop, suggesting the same unhelpful remedies over and over.
I imagine this has to to with concurrency requiring conceptual and logical reasoning, which LLMs are known to struggle with about as badly as they do with math and arithmetic. Now, it's possible that the right language to work with the LLM in these domains is not program code, but a spec language like TLA+. However, at that point, I'd probably just spend less effort to write the potentially tricky concurrent code myself.
I've had AI totally fail several times on Swift concurrency issues, i.e. threads deadlocking or similar issues. I've also had AI totally fail on memory usage issues in Swift. In both cases I've had to go back to reasoning over the bugs myself and debugging them by hand, fixing the code by hand.