Costs for a given amount of intelligence as measured by various benchmarks etc has been falling by 4-8x per year for a couple years, largely from smarter models from better training at a given size. I think there's still a decent amount of headroom there, and as others have mentioned dedicated inference chips are likely to be significantly cheaper than running inference on GPUs. I would expect to see Gemini Pro 2.5 levels of capability in models that cost <$1/Mtok by late next year or plausibly sooner.