All codex conversations need to be caveat with the model because it varies significantly. Codex requires very little tweaking but you do need to select the highest thinking model if you’re writing code and recommend the highest thinking NON-code model for planning. That’s really it, it takes task time up to 5-20m but it’s usually great.
Then I ask Opus to take a pass and clean up to match codebase specs and it’s usually sufficient. Most of what I do now is detailed briefs for Codex, which is…fine.
I will jump between a ChatGPT window and a VSCode window with the Codex plugin. I'll create an initial prompt in ChatGPT, which will ask the coding agent to audit the current implementation, then draft an implementation plan. The plan bounces between Chat and Codex about 5 times, with Chat telling Codex how to improve. Then Codex implements, creates an implementation summary, which I give to Chat. Chat then asks to add a couple of things fixes, then it's done.
Why non-thinking model? Also 5-20 minutes?! I guess I don’t know what kind of code you are writing but for my web app backends/frontends planning takes like 2-5 minutes tops with Sonnet and I have yet to feel the need to even try Opus.
I probably write overly detailed starting prompts but it means I get pretty aligned results. It does take longer but I try to think through the implementation first before the planning starts.
Then I ask Opus to take a pass and clean up to match codebase specs and it’s usually sufficient. Most of what I do now is detailed briefs for Codex, which is…fine.