The amusing thing is that it takes several orders of magnitude less data to bring up a human to reasonably competent adulthood, which means that there is something fundamentally flawed in the brute-force approach to training LLMs, if the goal is to get to human-equivalent competency.
Also the fact that 30B models, while less capable than 300B+ models, are not quite one whole order of magnitude less capable, suggests that all things being equal, capability scales sub-linearly to parameter count. It's even more flagrant with 4B models, honestly. The fact that those are serviceable at all is kind of amazing.
Both factors add up to the hunch that a point of diminishing returns must soon be met, if it hasn't already. But as long as no one asks where all the money went I suppose we can keep going for a while still. Just a few more trillions bro, we're so close.
I suspect there’s a good deal baked into a human brain we’re not fully aware of. So babies aren’t starting from zero, they have billions of years of evolution to bootstrap from.
For example, language might not be baked in, but the software for quickly learning human languages certainly is.
To take a simple example, spiders aren’t taught by their mothers how to spin webs. They do it instinctively. So that means the software for spinning a web is in the spider’s DNA.
Also the fact that 30B models, while less capable than 300B+ models, are not quite one whole order of magnitude less capable, suggests that all things being equal, capability scales sub-linearly to parameter count. It's even more flagrant with 4B models, honestly. The fact that those are serviceable at all is kind of amazing.
Both factors add up to the hunch that a point of diminishing returns must soon be met, if it hasn't already. But as long as no one asks where all the money went I suppose we can keep going for a while still. Just a few more trillions bro, we're so close.