My experience working on ml at couple faang like companies is gpus actually tend...

TheAlchemist · on Aug 4, 2024

Interesting, thanks.

Let me reframe the question - assume it's not only 100x GPUs, but all the performance bottlenecks you've mentioned are also solved or accelerated x100.

What kind of improvement would we observe, given the current state of the models and knowledge ?

Mehdi2277 · on Aug 4, 2024

If I assume you mean LLM like models similar to chatgpt that is pretty debated in the community. Several years ago many people in ML community believed we were at plateau and that throwing more compute/money would not give significant improvements. Well then LLMs did much better than expected as they scaled up and continue to iterate now on various benchmarks.

So are we now at performance plateau? I know people at openai like places that think AGI is likely in next 3-5 years and is mostly scaling up context/performance/a few other key bets away. I know others who think that is unlikely in next few decades.

My personal view is I would expect 100x speed up to make ML used even more broadly and to allow more companies outside big players to have there own foundation models tuned for their use cases or other specialized domain models outside language modeling. Even now I still see tabular datasets (recommender systems, pricing models, etc) as most common to work in industry jobs. As for impact 100x compute will have for leading models like openai/anthropic I honestly have little confidence what will happen.

The rest of this is very speculative and not sure of, but my personal gut is we still need other algorithmic improvements like better ways to represent storing memory that models can later query/search for, but honestly part of that is just math/cs background in me not wanting everything to end up being hardware problem. Other part is I’m doubtful human like intelligence is so compute expensive and we can’t find more cost efficient ways for models to learn but maybe our nervous system is just much faster at parallel computation?

HdS84 · on Aug 4, 2024

The human brain manages to work with 0.3 kWh per day - even if we say all of that is used for training "models" and for twenty years that's only 2200kwh - much less then what chat needed to train (500mwh?). So there are obviously lots of thinks we can do to improve efficiency. On the other hand, our brains hat hundreds of millions of years to be optimized for energy consumption.

thanksgiving · on Aug 4, 2024

A friend showed me some python code or something that demonstrates facial recognition by calculating the distance between facial features - eyes, nose...

I had never thought about this before but how do I recognize faces? I mostly recognize faces by context. And I don't have to match against a billion faces, probably a hundred or so? And I still suck at this.

The fact that human brain works with 0.3 kW per day likely doesn't mean much. How do we even start asking the question - is a human brain thermally (or resource in general) constrained?

hnaccount_rng · on Aug 4, 2024

The brain is responsible for about 1/5 of the total energy expenditure (and therefore food requirement)of a human body. So yes, on a biological level, there is significant resource constraints on a human brain. What is less clear is whether this actually holds for the “computing” part (as contrasted with the “sustainment”, think cell replacement, part)

throwaway48476 · on Aug 4, 2024

My brain is noticeably thermally constrained every summer.

throwaway48476 · on Aug 4, 2024

You're vastly underestimating the amount of data stored in the LLM weights compared to the amount of memory a human has.

chii · on Aug 4, 2024

There's some speculation that there are higher horizons to the training, as explained in this video: https://www.youtube.com/watch?v=Nvb_4Jj5kBo

the term for it is "grokking", amusingly. There's some indication that we are actually undertraining by 10x

Vecr · on Aug 4, 2024

I've seen improvement numbers up to 12x, but after that the returns are so diminishing that there's not really a point. 12x on training costs I mean, probably still won't get AGI.

theGnuMe · on Aug 5, 2024

realtime retraining and generation.

hnlmorg · on Aug 4, 2024

Well put. This was my experience when working for an AI start up too.

Frustratingly, it’s also the hardest part to solve too. Throwing more compute at the problem is easy but diagnosing and then solving those other bottlenecks takes a great deal of time and not to mention experience across a number of specialty domains that aren’t typically mutually inclusive.