"We present Dreamer 4, a scalable agent that learns to solve control tasks by imagination training inside of a fast and accurate world model. ... By training inside of its world model, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, aligning it with applications such as robotics where online interaction is often impractical."
In other words, it learns by watching, e.g. by having more data of a certain type.
Suppose we simplify the scenario and think of experiences as draws from a discrete probability distribution, e.g. p=[0.1, 0.1, 0.7, 0.1].
Suppose further that all events are a draw of type 1, 2, 3, or 4, and that our memory kept a count and updated the distribution - it is essentially a frequency distribution.
When we encounter a stimulus, we have to (1) recognize it and (2) assign a reward valence to it. If we only ever observed '3', the distribution would become very peaked. Correspondingly, this suggests that we would recognize '3' events faster and be better at assigning a reward valence to those events.
Then if we ever encounter a non-3 event, we would recognize it more slowly - it is well-established that recognition is tied to encounter frequency - and do a poorer job assigning reward valence to it. Together this means that we would do a bad job selecting the appropriate response.
Perhaps this scenario-based dreaming keeps us (and rats) primed so we're not flat-footed in new scenarios.
The question then becomes - if these scenarios are purely imagined, where are they being sampled from? If we never observe 1, 2, and 4...how do we know that these are the true list of alternative scenarios?
Well, please show me any mainstream finance media that questions it. The Fed has a monthly press conference, I don't think they were asked even once in the last 10 years at least... Most of the buying now is from Asia supposedly.
Yes, but the overall framework - that our thinking can be broadly divided into habits/reaction and deliberation/forecasting - still holds, with a ton of supporting neural evidence.
I don’t think even that is true or particularly useful. Why does it need to be binary? Couldn’t it be a spectrum? Are “fast thoughts” always habits and reactions? Some people can calculate large numbers very quickly, some people describe their inspiration for complex ideas happening in a single moment like a muse delivered it to them. Some people process thoughts and feelings for decades until they suddenly snap. I don’t think this “fast vs slow thoughts” framework really has any value to it, and it excludes a lot of thought patterns.
In terms of being useful, it depends on the field. In reinforcement learning, for example, the entire field is divided into 'model-free' and 'model-based' approaches. Model-free approaches are learning state-action mappings, which are like habits or 'fast' thinking. Model-based approaches try to build a world model that allows planning and forecasting.
If you're looking for utility in terms of understanding cognition, here are some resources if you're interested - pop-sci books written by cognitive scientists:
Wolfe wrote in a letter, "My definition of a great story has nothing to do with 'a varied and interesting background.' It is: One that can be read with pleasure by a cultivated reader and reread with increasing pleasure."
The incentive of all psychology researchers is to do new work rather than replications. Because of this, publicly-funded psychology PhDs should be required to perform study replication as part of their training. Protocol + results should be put in a database.
Sure, dump it on the lowest level employee, who has the least training and the most to lose. Punish them for someone else's bad research. Grad school already takes too long, pays too little, and involves too much risk of not finishing. And it doesn't solve the problem of people having to generate copious quantities of research in order to sustain their careers.
This is neat - potentially - but the authors should include some benchmarks where this added complexity increases system performance. Otherwise it could be a solution in search of a problem.
Yes, I want to run benchmarks, but I haven’t found a suitable benchmark library yet, and I’m still not great with Python.
Anda AI App has integrated KIP, but based on its performance, it hasn’t met expectations yet. Current LLMs like GPT-5 and Gemini 2.5 Pro still can’t effectively power KIP.
"We present Dreamer 4, a scalable agent that learns to solve control tasks by imagination training inside of a fast and accurate world model. ... By training inside of its world model, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, aligning it with applications such as robotics where online interaction is often impractical."
In other words, it learns by watching, e.g. by having more data of a certain type.
reply