That seems super far fetched given that 37%[1] of the world's population does not have internet access. You could reasonably restrict further to populations that speak languages that are even passably represented in LLMs.
Even disregarding that, if you're making <3000 euros a year, I really don't think you'd be willing or able to spend that much money to let your computer gaslight you.
But we are not dealing here with the public data. Stalking people, recording their every step and action so then you can sell their behavioural habits is not collecting public data, it’s stalking and invading people's private life.
The gap between high level and low level control of robots is closing. Right now thousands of hours of task specific training data is being collected and trained on to create models that can control robots to execute specific tasks in specific contexts. This essentially turns the operation of a robot into a kind of video game, where inputs are only needed a in low-dimensional abstract form, such as "empty the dishwasher" or "repeat what I do" or "put your finger in the loop and pull the string".
This will be combined with high-level control agents like SIMA 2 to create useful real-world robots.
I work on a much easier problem (physics-based character animation) after spending a few years in motion planning, and I haven’t really seen anything to suggest that the problem is going to be solved any time soon by collecting more data.
"We present Dreamer 4, a scalable agent that learns to solve control tasks by imagination training inside of a fast and accurate world model. ... By training inside of its world model, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, aligning it with applications such as robotics where online interaction is often impractical."
In other words, it learns by watching, e.g. by having more data of a certain type.
I am pushing the optimism a bit of course, but currently we can see many demos of robots doing basic tasks, and it seems like it is quite easy nowadays to do this with the data driven approach.
The problem becomes complicated once the large discrete objects are not actuated. Even worse if the large discrete objects are not consistently observable because of occlusions or other sensor limitations. And almost impossible if the large discrete objects are actuated by other agents with potentially adversarial goals.
Self driving cars, an application in which physics is simple and arguably two dimensional, have taken more than a decade to get to a deployable solution.
Next to zero cognition was involved in the process. There's some kind of hierarchy of thought in the way my mind/brain/body processed the task. I did cognitively decide to get the beer, but I was focused on something at work and continued to think about that in great detail as the rest of me did all of the motion planning and articulation required to get up, walk through two doorways, open the door on the fridge, grab a beer, close the door, walk back and crack the beer as I was sitting down.
Basically zero thought in that entire sequence.
I think what's happening today with all of this stuff is ultimately like me trying to play Fur Elise on piano. I don't have a piano. I don't know how to play one. I'm going to be all brain in that entire process and it's going to be awful.
We need to learn how to use the data we have to train these layers of abstraction that allow us to effectively compress tons of sophistication into 'get a beer'.
> This essentially turns the operation of a robot into a kind of video game, where inputs are only needed a in low-dimensional abstract form, such as "empty the dishwasher" or "repeat what I do" or "put your finger in the loop and pull the string"
I don't really understand, how is this like a video game? What about these inputs is "low-dimensional"? How does what you describe interact with a "high-level control agents like SIMA 2"? Doesn't SIMA 2 translate inputs like "empty the dishwasher" into key presses or interaction with some other direct control interface?
Say you want to steer an android to walk forward. You need to provide angles or forces or voltages for all the actuators for every moment in time, so that's high dimensional. If you already have certain control models, neural or not, you can instead just press forward on a joystick. So what I mean low dimensional input is when someone steers a robot using a controller. That's got like, idk, 10-20 dimensions max. And my understanding is that SIMA 2 when it plays No Man's Sky or whatever basically provides such low dimensional controls, like a video game. Companies like Figure and Tesla are training models that can do tasks like folding clothes or emptying the dishwasher given low dimensional inputs like "move in this direction and tidy up". SIMA has the understanding to provide these inputs.
This is pretty cool. I have several points to make.
1. We all know that Cellular automata or more generally any dynamical system of sufficient complexity (and maybe not too much complexity) will be Turing complete, will have complicated "uncomputable" behavior, will have perhaps pattern formation, or gliders, solitons etc.
So what is a valuable addition to this these computational investigations? I think when studying emergent computational behavior we really care about dynamics complexity / rules complexity. It's not impressive to get complicated dynamics out of a complicated system but the simplicity of game of life made it really impressive.
I think in that regard LACE is pretty nice: the rule still feels very simple/natural and you can get much more structured/complex behavior with fewer cells.
2. Nevertheless in the end this blog shows mostly pretty pictures of computational, complex, emergent, chaotic behavior, which we've all seen before. And the key features that make the difference go something I would call physics-like are still missing.
And I guess that would be complex stable patterns that can have complex stable interactions. Who knows maybe there are 10^16-celled patterns that have this but we don't know.
3. If I were you I would cut the whole preamble. It will make people take you less seriously than they should. You don't want to look like a crank.
+1 to this copy being a little bit over-the-top. This is neat, but, as you pointed out at the end of the day this is still computationally equivalent to normal 2d cellular automata. I suspect (not taking the time to prove this) that it's equal in a fairly obvious way, which is that you could just replace "links" with 8*<num link states> additional sub-states per cell. The only real difference is just in how it's visualized.
Seems like maybe not the only instance of self-aggrandizement in this blog... the sidebar shows posts about things like the quantum mechanics of consciousness other such quackery.
Wow. One of the links is titled "Clarifying the Hierarchy of Information-Derived Gravitational Tensors in Consciousness Field Theory". The References section has 4 entries, all written by the author of this blog.
Actually - on that particular topic - you might want to read the articles and the math before forming an opinion - start from information geometry and fisher information metrics of information processing - which is what that particular line of thought is all about...Everything on that particular line of exploration is built up from fisher information where "consciousness" is defined as a high complexity information processing geometry and so its measurable. Because its defined as a measurable physical process, not as a magical quality, it's a mathematical argument. But of course that would require reading, and sufficient knowledge to understand it.
The observation that other CA can be equivalent is a weak critique at best, this CA may be a nice compact way of describing types of CA that have interesting properties. It is not terribly interesting that it may be subsumed by some other CA. It may be some interesting unstudied subset.
For instance the Game of Life is a subset of 2-d binary state CA, the rule only takes the totals of neighboring cells, and so is a subset of those CAs with rules that care about specific patterns of neighbors.
These rules use very different principles than traditional cell-based rules - for example neighbor degree, number of connections, and eligibility criteria based on connectivity. So the cells are not becoming alive or dead based on the states of their neighbors, but rather on the topology of their neighborhoods.
The details are beyond the scope of a short write up, but are easy to explore in the rule-editor in the GUI of the code.
And preamble pruned of the historical anecdote behind this.
An interesting approach to characterize graph topology, both locally and globally is to use a graphlet transform, there some interesting research happening around these types of topology signals, here's one that takes a very algebraic approach
though what has not been brought up is the several ways that everything space x is flying, can be bricked simultaneously, so there failure modes present unique risks
Hold on a second. A transformer produces deterministically a probability distribution over the token alphabet from the context. Then one samples from this distribution. This is random and meant to be random.
The sampling process isn't random. If you sample with identical sampling parameters and identical values for said parameters, you will always get same results. You only start getting "non deterministic" behavior when you start using more complex systems outside the scope of your control like multi gpu systems and batch processing. One llm sampled with cash prompting off and and batch processing off will always generate same results if all values are same.
It's possible to deterministically sample from a probability distribution. For example, just seed your RNG with a constant, or with the SHA256 hash of the context.
Well yes, you can "hack" the pseudorandom number generator, but... that's not really the point when talking about determinism in LLMs is it? I mean the mathematical idea of the standard LLM is certainly truly random.
> I mean the mathematical idea of the standard LLM is certainly truly random.
Not really, LLMs give you a distribution over possible next tokens.
You are free to then sample from this distribution how you want.
There is no need to hack RNG or whatever, for example you can simply just take a greedy approach and always output the most likely token, in which case the LLM becomes deterministic (mathematically). This is equivalent to setting the temperature to 0.
Once I kept refreshing and finally got an English question, it asked me to act like a Linux terminal, and issues pwd, ls, then cd over and over until I gave up. The concept is funny, where I get to act like CrapGPT, but it needs to not get stuck asking the same thing over and over.
Maybe the role reversal breaks most of the RLHF training. The training was definitely not done in the context of role reversal, so it could be out of distribution. If so, this is a glimpse of the intelligence of the LLM core without the RL/RAG/etc tape and glue layers.