> Don’t many/most state of the art models take many months to train on far more ...

tacheiordache · on July 18, 2023

I see this stance of yours parroted over and over but a 3 year old can tell a dog from a cat doesn't need to be trained on millions of images. Also uses way less energy for that.

marcinzm · on July 18, 2023

It takes basically a week on a single GPU to train AlexNet which has human level ImageNet performance. Let's say it's 500 W for the GPU versus around 10 W for a human brain. So that's 84kwh for the model and 175kwh for the baby (over 3 years at 16h/day). That's without a half billion years of architecture and initialization tuning that the baby has. I think the model performs very favorably.

earthscienceman · on July 18, 2023

I don't. This is so obscenely flawed in obvious ways. The energy to train the model was used only for model training while the energy used by the baby performed a myriad of tasks including image recognition, and can presumably apply the knowledge gained in novel ways. Not only can a baby identify a cat and a dog but it can also speak what the difference is in audible language, fire neurons to operate its musculoskeletal system (albeit poorly), and perhaps even no longer shits its pants. Apples and Oranges. Is model performance getting more impressive every day? Definitely. Has anyone actually demonstrated "AI". Still nope.

marcinzm · on July 19, 2023

The context of this thread is the cost of training brains and models on comparable tasks. Not that the model is comparable to a human in every way.

If you want to be pedantic then 6% of the human brain is the visual cortex but then you also have to argue that AlexNet is horribly inefficient to train. So you cut the brain cost to 6% and the model cost to 1%. They're still within an order of magnitude (favoring the model) which I'd say is pretty close in terms of energy usage.

ShamelessC · on July 19, 2023

That context is more narrow than I suggested.

actionfromafar · on July 18, 2023

But don’t 3 yr olds have more skills than distinguishing dogs from cats?

marcinzm · on July 19, 2023

Sure but my point is that the energy costs are in the same magnitude.

If you want to be pedantic then only 6% of the human brain is the visual cortex. But AlexNet is also an inefficient model so something like an optimized ResNet is 100x as efficient to train. So now you're at 10.5kwh and 1.5kwh for the baby and model respectively.

You can argue details further but I'd say the energy cost of both is fairly close.

ShamelessC · on July 19, 2023

You’re missing my original point which is about continued, ongoing robustness that works in the low data regime and allows pilots/astronauts to make reasonable decisions in _completely novel_ situations (as just one example).

The networks we have are trained once and work efficiently for their training dataset. They are even robust to outliers in the distribution of that dataset. But they aren’t robust to surprises, unmentioned changes in assumptions/rules/patterns about the data.

Even reinforcement learning is still struggling with this as self-play effectively requires being able to run your dataset/simulation quickly enough to find new policies that work. Humans don’t even have time for that, much less access to a full running simulation of their environments. Although we do generate world models and there’s some work towards that I believe.

Again happy to be corrected.

sdenton4 · on July 19, 2023

Not necessarily! Feel free to gather some data on 3 year old skills on imagenet... I'll wait.

naasking · on July 18, 2023

> but a 3 year old can tell a dog from a cat doesn't need to be trained on millions of images

A 3 year old has 3 years of multimodal training data and RLHF + a few billions of years of evolution that have primed and biased our visual and cognitive systems.

That requires a lot more data than machine models that literally zero inherent bias. Assuming you want a true apples to apples comparison.

sdenton4 · on July 19, 2023

I can train a bird song recognition model in about two days in a v100 which performs decently well on upwards of three thousand species, and generalizes reasonably to real world data (beyond a somewhat skewed training data distribution).

Humans are very bad at this task; it takes a massive effort to learn this many birds. In fact it's a great counterexample to human few shot learning ability...

janalsncm · on July 19, 2023

This is under the assumption that brains start at random (the Tabula Rasa theory of the brain) but that doesn’t seem plausible to me. Brains have the benefit of some amount of pre training at the time of birth. That’s why spiders don’t need to be taught how to spin complex webs and why humans don’t need to learn how to manipulate abstract mental symbols (i.e. language).

landryraccoon · on July 18, 2023

A 3 year old still takes 3 years to train. Even a state of the art image recognition model takes way less time than that.

ChatGTP · on July 18, 2023

The moronic thing about this is, humans aren't just training, they're having a life. The "training part" is one part of it sure, but it's not the reason for existence.

sebzim4500 · on July 19, 2023

What's the functional difference between training and having a life?

beepbooptheory · on July 18, 2023

Not sure if you meant this as a joke, but it made me smile. That poor child!

AbrahamParangi · on July 18, 2023

The structure of that 3yr old’s visual cortex is itself the result of a ~500 million year optimization process.