If you're parsing JSON or other serialization formats, I expect that can be nont...

If you're parsing JSON or other serialization formats, I expect that can be nontrivial. Yes, it's dominated by the llm call, but serializing is pure CPU.

Also, ideally your lightweight client logic can run on a small device/server with bounded memory usage. If OpenAI spins up a server for each codex query, the size of that server matters (at scale/cost) so shaving off mb of overhead is worthwhile.