I've always been leery about outrageous GPU investments, at some point I'll dig through and find my prior comments where I've said as much to that effect.
The CEOs, upper management, and governments derive their importance on how much money they can spend - AI gave them the opportunity for them to confidently say that if you give me $X I can deliver Y and they turn around and give that money to NVidia. The problem was reduced to a simple function of raising money and spending that money making them the most importance central figure. ML researchers are very much secondary to securing funding. Since these people compete with each other in importance they strived for larger dollar figures - a modern dick waving competition. Those of us who lobbied for efficiency were sidelined as we were a threat. It was seen as potentially making the CEO look bad and encroaching in on their importance. If the task can be done for cheap by smart people then that severely undermines the CEOs value proposition.
With the general financialization of the economy the wealth effect of the increase in the cost of goods increases wealth by a greater amount than the increase in cost of goods - so that if the cost of housing goes up more people can afford them. This financialization is a one way ratchet. It appears that the US economy was looking forward to blowing another bubble and now that bubble has been popped in its infancy. I think the slowness of the popping of this bubble underscores how little the major players know about what has just happened - I could be wrong about that but I don't know how yet.
Edit:
"[big companies] would much rather spend huge amounts of money on chips than hire a competent researcher who might tell them that they didn’t really need to waste so much money." (https://news.ycombinator.com/item?id=39483092 11 months ago)
The cost of having excess compute is less than the cost of not having enough compute to be competitive. Because of demand, if you realize you your current compute is insufficient there is a long turnaround to building up your infrastructure, at which point you are falling behind. All the major players are simultaneously working on increasing capabilities and reducing inference cost. What they aren’t optimizing is their total investments in AI. The cost of over-investment is just a drag on overall efficiency, but the cost of under-investment is existential.
IMO the you cannot fail by investing in compute. If it turns out you only need 1/1000th of the compute to train and or run your models, great! Now you can spend that compute on inference that solves actual problems humans have.
o3 $4k compute spend per task made it pretty clear that once we reach AGI inference is going to be the majority of spend. We'll spend compute getting AI to cure cancer or improve itself rather than just training at chatbot that helps students cheat on their exams. The more compute you have, the more problems you can solve faster, the bigger your advantage, especially if/when recursive self improvement kicks off, efficiency improvements only widen this gap
Of course optimizing for the best models would result in a mix of GPU spend and ML researchers experimenting with efficiency. And it may not make any sense to spend money on researching efficiency since, as has happened, these are often shared anyway for free.
What I was cautioning people was be that you might not want to spend 500B on NVidia hardware only to find out rather quickly that you didn't need to. You'd have all this CapEx that you now have to try to extract from customers from what has essentially been commoditized. That's a whole lot of money to lose very quickly. Plus there is a zero sum power dynamic at play between the CEO and ML researchers.
Not necessarily if you are pushing against a data wall. One could ask: after adjusting for DS efficiency gains how much more compute has OpenAI spent? Is their model correspondingly better? Or even DS could easily afford more than $6 million in compute but why didn't they just push the scaling?
because they’re able to pass signal on tons of newly generated tokens based on whether they result in a correct answer, rather than just fitting on existing tokens.
Agree. The "need to build new buildings, new power plants, buy huge numbers of today's chips from one vendor" never made any sense considering we don't know what would be done in those buildings in 5 years when they're ready.
The other side of this is that if this is over investment (likely)
Then in 5 years time resources will be much cheaper and spur alot of exploration developments. There are many people with many ideas, and a lot of them are just lacking compute to attempt them.
My back of mind thought is that worst case it will be like how the US overbuilt fiber in the 90s, which led the way for cloud, network and such in 2000s.
The whole thing feels like it is just a giant money sink. Are there going to be 5-10 companies that spend 100 billion, and then they are done, no one else can catch up and copy their training strategy? I think much of these billions will be wasted, we'll have power plans that we don't need and then more justification for coal plants. Could it be it ends up making electricity cheaper overtime from over capacity? I think so.
As AI or whatever gains more capability, I'm sure it will do more useful things, but I just see it displacing more non-physical jobs, and now will expand the reach of individual programmers, removing some white color jobs (hardly anyone uses an agent to buy their ticket), but that will result is less need for programmers. Less secretaries, even less humans doing actual tech support.
This just feels like radio stocks in the great depression in the us.
I think you're right. If someone's into tech but also follows finance/economics, they might notice something familiar—the AI industry (especially GPUs) is getting financialized.
The market forces players to churn out GPUs like the Fed prints dollars. NVIDIA doesn't even need to make real GPUs—just hype up demand projections, performance claims, and order numbers.
Efficiency doesn't matter here. Nobody's tracking real returns—it's all about keeping the cash flowing.
The results never fell off significantly with more training. Same model with longer training time on those bigger clusters should outdo it significantly. And they can expand the MoE model sizes without the same memory and bandwidth constraints.
Still very surprising with so much less compute they were still able to do so well in the model architecture/hyperparameter exploration phase compared with Meta.
The CEOs, upper management, and governments derive their importance on how much money they can spend - AI gave them the opportunity for them to confidently say that if you give me $X I can deliver Y and they turn around and give that money to NVidia. The problem was reduced to a simple function of raising money and spending that money making them the most importance central figure. ML researchers are very much secondary to securing funding. Since these people compete with each other in importance they strived for larger dollar figures - a modern dick waving competition. Those of us who lobbied for efficiency were sidelined as we were a threat. It was seen as potentially making the CEO look bad and encroaching in on their importance. If the task can be done for cheap by smart people then that severely undermines the CEOs value proposition.
With the general financialization of the economy the wealth effect of the increase in the cost of goods increases wealth by a greater amount than the increase in cost of goods - so that if the cost of housing goes up more people can afford them. This financialization is a one way ratchet. It appears that the US economy was looking forward to blowing another bubble and now that bubble has been popped in its infancy. I think the slowness of the popping of this bubble underscores how little the major players know about what has just happened - I could be wrong about that but I don't know how yet.
Edit: "[big companies] would much rather spend huge amounts of money on chips than hire a competent researcher who might tell them that they didn’t really need to waste so much money." (https://news.ycombinator.com/item?id=39483092 11 months ago)