You seem to think GPUs are better than TPUs for rapid iteration. Why is that? There's no inherent reason why one is more suited to rapid iteration than another; it's entirely a matter of developer tooling and infrastructure. And Google famously has excellent tooling. And furthermore, the tooling Google exposes to the outside world is usually poorer than the tooling used internally by Googlers.
It was a few years ago since I last played with Googles hw, but iirc TPUs were inflexible and very fast. Worked well for linear and convolutional layers but could not accelerate certain LSTM configurations. For such networks GPUs were faster. It wouldn't surprise me the least if TPU hardware support lagged behind what the latest and greatest LLMs require for training.