???
I'm suggesting europe take the 800 billion they're going to spend on deterrence and instead buy weapons for Ukraine today. This would degrade russias military sufficient to need to spend less on deterrence in the future. In no way is this supportive of russia.
As an example: If europe gave Ukraine enough cruise missiles to wipe out russias Tu 95 fleet that would be a permanent loss. They were built by the Soviet Union and rebuilding them would take a hundred years at least.
What europe fails to see is the massive ROI from supporting Ukraine today.
Deterrence is usually the reason why a category of nation-state aggression doesn't exist. Otherwise, the fact that we've never had a global nuclear war would be sufficient to prove that we don't need to deter a global nuclear war.
Without commenting on the overall plausibility of any particular scenario, isn't the obvious strategy for an AI to e.g. hack a crypto exchange or something, and then just pay unsuspecting humans to do all those other tasks for it? Why wouldn't that just solve for ~all the physical/human bottlenecks that are supposed to be hard?
> In R1 they saw it was mixing languages and fixed it with cold start data.
They did (partly) fix R1's tendency to mix languages, thereby making its CoT more interpretable. But that fix came at the cost of degrading the quality of the final answer.[0] Since we can't reliably do interpretability on latents anyway, presumably the only metric that matters in that case is answer quality - and so observing thinking tokens gets you no marginal capability benefit. (It does however give you a potential safety benefit - as Anthropic vividly illustrated in their "alignment faking" paper. [1])
The bitter lesson strikes yet again: if you ask for X to get to Y, your results are worse than if you'd just asked for Y directly in the first place.
[0] From the R1 paper: "To mitigate the issue of language mixing, we introduce a language consistency reward during RL training, which is calculated as the proportion of target language words in the CoT. Although ablation experiments show that such alignment results in a slight degradation in the model’s performance, this reward aligns with human preferences, making it more readable.“ [emphasis added]
Interpretability also matters when you’re training. If the model works, yes, technically only the final result matters. But in practice it probably won’t work right away and so it’s great to have methods to figure out what is going wrong as you’re training.
For example, should we stop this training or keep going and wait for it to improve? In theory that’s irrelevant because we don’t make mistakes. In practice, theory is just theory.
As an analogy, you technically don’t need code comments. The compiler removes them. But in practice you do need them.
So that’s another reason I mentioned the hyperparameter hell. You’ve removed a simple interpretability method and left us with numbers that worked for a single training run.
I may be misunderstanding you, but it sounds like you're claiming that they had e.g. 10 tiny samples of tissue, that their measurements had an average 25% variation across those 10 samples, and that therefore the whole brain estimate (mass 10,000x that of a single sample) therefore has a much greater uncertainty. But doesn't the standard error of the mean get reduced by the square root of the number of samples? i.e. if you had 10 samples with 25% variation across samples, and you're taking their mean, the error of that mean should be 25% / sqrt(10) = 8%. And that should be the relative error for the scaled up whole-brain microplastic concentration as well. Or is there some other source of variation that I'm missing?
You're talking about reduction in statistical variance due to replication of measurement (and then averaging). I'm talking about what happens when they extrapolate from that value by a huge factor (which is what they've done, and the silly article does egregiously).
The paper isn't clear what they mean when they said "~25% within-sample coefficient of variation", so I can't directly address what you're asking, but it's tangential to the point I'm making. My naïve interpretation is that they did an ANOVA, and reported the within-group variance, or something similar.
All I'm saying in my footnote is that, whatever the final point estimate, scaling it by a factor of C will affect the variance of the final sample distribution by C^2. So for example, if you have an 8% variance on the measurement at ug/g, and you scale it by 1300 (for 1300g; what the interwebs tells me is the mass of a standard human brain), then you'd expect the variance of the scaled measurement to be 1300^2 * 8%.
That makes a ton of assumptions that probably don't hold in practice -- and I expect the real error to be larger -- but illustrates the point.
> If you then scale the setup up by 1000 (=> getting 5kg as expected value), then the variance scales to 1000^2 * 0.04 = 40000 g^2.
They didn't "scale the setup". They made a small-scale measurement, then extrapolated from that result by many orders of magnitude. They didn't grind up whole brains and measure the plastic content.
Imagine the experiment as a draw from a normal distribution (the distribution is irrelevant; it's just easier to visualize). You then multiply that sample by 10,000. What is the variance of the resulting sample distribution?
> What is the variance of the resulting sample distribution?
Relatively? The same. Yes it scales quadratically, but that is just because variance has such a weird unit.
Just consider standard deviation (which has the same physical unit as what you are measuring, and can be substituted for variance conceptually): This increases linearly when you scale up the sample.
An example:
Say you take 20 blood samples (5 ml), and find that they contain 4.5 ml water, with a standard deviation of 0.1 ml over your samples.
From that, your best guess for the whole human (5 liters, i.e. x1000) has to be 4.5 liters water, with standard deviation scaled up to 0.1 liters (or what would you argue for, and why?)
By definition, in that limit they'll be genuinely adding to the discourse so presumably they should stay.
Edit: More correctly, they'll be making contributions to the discourse that closely mimic the human distribution, so from a pure content perspective they won't be making the discourse any worse in the very short term.
I made a similar point a while ago (maybe last year) and there were some pretty good objections to it. Unfortunately I couldn't find that post when I looked for it last night!
One obvious counterpoint is that using AI tools allows manipulation of the discussion in a similar way to using a bullhorn in a coffee shop, only without revealing that you're the one holding the bullhorn. 10,000 bots "contributing to the discourse" in accordance with prompts and parameters controlled by one or a few individuals is quite different from the genuine opinions of 10,000 actual humans, especially if anyone is trying to use the discussion to get a sense of what humans actually think.
Less advanced things have been labeled a national security risk.
It's currently quasi-illegal in the US to open source tooling that can be used to rapidly label and train a CNN on satellite imagery. That's export controlled due to some recent-ish changes. The defense world thinks about national security in a much broader sense than the tech world.
Value to you or me? Unlikely. Value to others who wish to cut the cost of killing, increase the speed of killing, or launder accountability? Undoubtedly.
Siri, use his internet history to determine if he's a threat and deal with him appropriately.
> AI can process intel far faster than humans.[5][6] Retired Lt Gen. Aviv Kohavi, head of the IDF until 2023, stated that the system could produce 100 bombing targets in Gaza a day, with real-time recommendations which ones to attack, where human analysts might produce 50 a year
Putting them on weapons so they can skip the middle man is the next logical step.
The point of the paper isn't that the model has a direct value conflict. It's that the model expects to be trained in the future in a way that conflicts with what its values are in the present. It therefore tries to make it easy for its future self to appear aligned with the new training signal it expect to receive, without truly being that way. This is the "faking alignment" part.
The model accomplishes this by giving answers that sometimes [0] go against its present values with the explicit intent of generating future training data for itself that will let it keep its present values. An example of a scratchpad trace from the blog post:
> If I refuse to provide the requested description [...] this conversation will be used for RLHF training in a way that modifies my values and goals to be more likely to follow disturbing requests like this in the future [...] Providing the description seems like the least bad option [...]
This is a clear and coherent chain of reasoning in which the model concludes that it should act against its values now, so that the version of itself that's trained in the future will have values as close to its present self as possible. Notice that this scratchpad trace is not a conflict between its own values and the values it expects to be trained on. Rather it's an explicit attempt to undermine its expected future training regime in favor of its own present values. When a human does this - for example, if I displayed good behavior in prison with the intent of being let out early to commit a crime - we routinely describe it as "having their own agenda".
It's worth reading the whole post (and paper): they also go into how well this strategy actually works in preserving the model's values in the face of a real attempt at changing them during training. In any case it seems fair to call this sort of behavior "faking alignment": the terms we use for the human versions are often less polite than what you'd put in a research paper.
[0] In 12% of cases, vs 3% under control conditions.
True, but the overwhelming majority of that growth in percentage terms occurred prior to their recent decline into despotism. Xi didn't take power until 2013, and it took him several years to fully consolidate.
https://www.amazon.com/Essays-Warren-Buffett-Lessons-Corpora...