I wonder if anyone who’s really *in the know* could summarize where the research...

johnsmith1840 · 2025-06-13T23:49:11 1749858551

We have no idea how to do continual learning.

Many people here are right, compute, collapse, forgetting whatever.

The only "real" way to do this would be: 1. Train a model 2. New data 3. Retrain the model in full + new data 4. Repeat 5. You still have no garuntee on the "time" aspect though.

But CL as a field basically has zero answers on how to do this in a true sense. It's crazy hard because the "solutions" are hypocritical in many ways.

We need to expand the model's representation space while keeping the previous representation space nearly the same?

Basically, you need to modify it without changing it.

Most annoying is that even the smallest of natural brains do this easily. I have a long winded theory but basically it boils down to AI likely needs to "sleep" or rest somehow.

mackenziebowes · 2025-06-14T00:29:16 1749860956

The cool thing about AI that I'm seeing as an outsider/non-academic, is that it's relatively cheap to clone. Sleeping/resting could be done by a "clone" and benefits could be distributed on a rolling schedule, right?

johnsmith1840 · 2025-06-14T00:38:42 1749861522

One clone takes a nap while the other works is pretty cool.

But the clone couldn't run without sleeping? So that's more of a teammate than a clone.

1 works while the other sleeps and then swap.

If this method ever worked our current alignment methods get chucked out the window those would be two completely different AI.

mackenziebowes · 2025-06-14T01:36:45 1749865005

I can't be certain, I'm not at all an AI engineer or math guy, but I think at the "wake up" point you equalize instances. Like during 'sleep' some list of functions/operations `m` are applied to model weights `n` producing a new model, `n + 1`. Wouldn't you just clone `n + 1`, send it to work, and start a new training run `m + 1` to make `n + 2`?

notpushkin · 2025-06-14T06:11:35 1749881495

This was my first idea as well. Keep training continuously and redeploy clones after each cycle. From a layman perspective this seems reasonable :thinking:

maleldil · 2025-06-15T02:01:00 1749952860

You can't realistically keep training the same model forever, or it will start forgetting things it knew before. The proper name for this is "catastrophic forgetting".

khalic · 2025-06-14T09:56:32 1749894992

You should look into LoRA, it’s a partial retraining method, doesn’t require nearly as much as retraining the whole model. It’s different from what this paper is suggesting. The self improvements in this paper even sets the rules for the improvements, basically creating new data out of what it has.

LoRA paper: https://arxiv.org/abs/2106.09685

zelphirkalt · 2025-06-15T08:07:18 1749974838

This only seems to be the case with the current crop of models. "Online learning" is a term for having models deployed and keeping them learning and it has been around for more basic models for a long time.

johnsmith1840 · 2025-06-15T17:23:34 1750008214

Not sure how much you've gotten into CL but online learning while similar is not the same.

Online learning is more akin to RL in that it's a structured and boxed enviroment. Step outside of that box or the box changes too much and you collapse.

CL is much more similar to meta learning. The concepts are more about learning NEW content while keeping previous the same.

CL is a completely open problem with all model types. EWC is amoung the better attempts (and a favorite of mine) at solving it with big limitations.

Inviz · 2025-06-15T05:18:05 1749964685

Evolving prompts seems to fit the "modify without changing" bill, does it?

johnsmith1840 · 2025-06-15T17:32:30 1750008750

Yes but it's similar to RNNs or energy models.

They try to keep a single continuous "state" that always updates.

It's more about going "farther" than something more akin to "go forever" that CL promises.

Scaling laws are true in that infinite scale would 100% lead to AGI. But at the same time the problem with it is that you can't infinitely scale the computation per task.

RL solves this problem in general but it has a deep assumption of knowing the future. Step too far out of the box and it collapses.

The smallest natural brains handle unknown future states with a fixed computation budget per timestep which is truly incredible.

Davidzheng · 2025-06-14T00:52:42 1749862362

but natural brains sleep too, which I guess is your point. But actually is it even clear in human brains whether most of neural compute is evaluation vs training? maybe the brain is like for e.g. capable of running 20T model of compute and deploying like 2B model at given time and most of compute is training in background new models--I mean like you say we have no idea except for training from scratch, but if we are working much below capacity of compute we could actually actively train from scratch repeatedly (like the xAI cluster could probably train gpt4o size in a matter of hours)

johnsmith1840 · 2025-06-14T00:33:06 1749861186

AGI likely a combination of these two papers + something new likely along the lines of distillation.

1. Preventing collapse -> model gets "full" https://arxiv.org/pdf/1612.00796

2. Forgetting causes better generalization https://arxiv.org/abs/2307.01163

3. Unknow paper that connects this - allow a "forgetting" model that improves generalization over time. - I tried for a long time to make this but it's a bit difficult

Fun implication is that if true this implies AGI will need "breaks" and likely need to consume non task content of high variety much like a person does.

khalic · 2025-06-14T10:05:50 1749895550

There is no sign that LLMs are capable of general reasoning, on the contrary, so hold your horses about that. We have proven they can do basic composition (as a developer, I see proof of this every time I generate some code with an assistant) which is amazing already, but we’re still far from anything like “general intelligence”.

johnsmith1840 · 2025-06-14T17:36:34 1749922594

My argument is that we already have psuedo/static reasoners. CL will turn our non reasoners into reasoners.

CL has been an open problem from the very beginnings of AI research with basically no solution. Its pervasiveness indicates a very deep misunderstanding on our knowledge of reasoning.

zelphirkalt · 2025-06-15T08:10:48 1749975048

That's really reaching way to far. We have no idea, whether that will lead to anything even close to AGI and it even seems more likely, that it will just run into the next hurdle.

johnsmith1840 · 2025-06-15T17:46:39 1750009599

Totally possible!

I just like talking about it. I will say that learning outside distribution content while keeping previous knowledge in a "useful" state is a capability that would absolutely supercharge ever AI method we currently have.

It's atleast an honest atempt at a research direction other than "scale infinitely for everything" that we currently do.

Just think about how natural brains do something incredible.

1. They have fixed computation budgets per time step. 2. They continously learn entirely new tasks while still maintaining previous in a useful state.

That's a capability I would very much like in my AI.

Scaling laws are correct but they are also the reason we are nowhere near replacing humans.

Take a simple job maybe admin work. Every timestep depends on the previous timestep. While not a complex job and an AI could do it for awhile but over time the compuation required to "look back" its memory and connect it for the next step grows near exponentially.

RAG is another perfect example of this problem.

I do deeply belive AGI will be solved by a kid with a whiteboard not a supercluster. CL is my best guess at what that means.

Maybe it's a super RL or energy type method but I've never seen it.

mnahkies · 2025-06-13T22:29:07 1749853747

I'm no expert, but I'd imagine privacy plays (or should play) a big role in this. I'd expect that compute costs mean any learning would have to be in aggregate rather than specific to the user which would then risk leaking information across sessions very likely.

I completely agree that figuring out a safe way to continually train feels like the biggest blocker to AGI

kcorbitt · 2025-06-13T23:26:31 1749857191

The real answer is that nobody trusts their automated evals enough to be confident that any given automatically-trained release actually improves performance, even if eval scores go up. So for now everyone batches up updates and vibe-checks them before rolling them out.

free_bip · 2025-06-13T22:18:15 1749853095

The most obvious problem is alignment. LLM finetuning is already known to be able to get rid of alignment, so any form of continuous fine tuning would in theory be able to as well.

notnullorvoid · 2025-06-13T22:36:58 1749854218

What kind of alignment are you referring to? Of course more fine-tuning can disrupt earlier fine-tuning, but that's a feature not a bug.

kadushka · 2025-06-13T22:05:04 1749852304

The most obvious blocker is catastrophic forgetting.

solarwindy · 2025-06-14T00:26:33 1749860793

Is that necessarily a blocker? As others in this thread have pointed out, this probably becomes possible only once sufficient compute is available for some form of non-public retraining, at the individual user level. In that case (and hand-waving away just how far off that is), does a model need to retain its generality?

Hypothetically (and perhaps more plausibly), a continually learning model that adapts to the context of a particular org / company / codebase / etc., could even be desirable.

kadushka · 2025-06-14T03:43:59 1749872639

Retraining the whole model from scratch every time you wanted it to learn something is not a solution.

does a model need to retain its generality?

Only if you want it to remain smart.

ivape · 2025-06-13T21:21:56 1749849716

The most obvious blocker is compute. This just requires a shit ton more compute.

johnsmith1840 · 2025-06-13T23:51:49 1749858709

If it was pure compute we'd have simple examples. We can't do this even on the smallest of AI models.

There are tons of benchmarks around this you can easily run with 1 gpu.

It's compute only in the sense that the only way to do it is retrain a model from scratch at every step.

If you solve CL with a CNN you just created AGI.

Davidzheng · 2025-06-14T00:55:42 1749862542

yeah but training from scratch is a valid solution. And if we can't find easier solutions we should just try to make it work. Compute is the main advantage we have in silica vs biological computers so we might as well push it--like ideally soon we will have one large AI running on datacenter size computer solving really hard problems and it could easily be most of the compute (>95%) is on training step--which is where really AI excels tbh not inference techniques. Like even Alphaproof for example spends most of compute training on solving simpler problems--which btw is one instance of continual training/training at test time which is implemented.

johnsmith1840 · 2025-06-15T17:56:55 1750010215

Retrain from stratch does technically solve it.

But it doesn't solve the time aspect.

You need to randomize data in order to train to best quality. In doing that the model has no idea t0 was before t1000. If you don't you get model collapse or heavy bias.

Some attempts at it but nothing crazy effective.

zelphirkalt · 2025-06-15T08:29:02 1749976142

How do you make the mental jump from being able to train a model continuously to an "artificial general intelligence"?

libraryofbabel · 2025-06-13T21:23:58 1749849838

That tracks, but say cost was no object and you had as many H100s as you wanted. Would continuous learning actually work even then?

IncreasePosts · 2025-06-13T22:01:42 1749852102

Maybe part of the inference outputs could be the updates to make to the network