I'm haunted by Roko's Basilisk. The story in it's original incarnation may be a ...

wizzwizz4 · on June 22, 2021

Tangent: You have nothing to fear from Roko's Basilisk. I analysed it from the perspective of four different decision theories, and in every one:

• It doesn't make sense to build the evil AI agent; and

• the evil AI agent has no incentive to torture people who decided not to build it (unless its utility function relates to such torture, but it doesn't make sense to build that AI agent unless you want to torture people – in which case, you should be scared of the mad scientist, not the AI).

I didn't publish because I find my essays embarrassing, but if you have specific worries I can assuage them.

shadowgovt · on June 23, 2021

The Basilisk's tangible threat to the person in the past also relies upon the notion that a perfect simulation of you is indistinguishable from you (or can be used as a bargaining chip to regulate your actions in the present), which is a hypothesis that rests on very shaky ground.

The easiest way to escape the Basilisk's control is to say "Future simulation of me? Screw that guy; he sucks and gets whatever's coming to him."

wizzwizz4 · on June 23, 2021

It also relies on the fact that you can simulate the Basilisk well enough to know that it'll definitely hurt you (or the simulated you), such that your observation of its (conditional) decision to hurt you affects your actions.

However, we're not good enough at simulating the Basilisk; if it would decide to do something else, we wouldn't know, so it has no reason to waste resources on torturing us, so we have no reason to believe the threat credible, so nobody will make the Basilisk in the first place.

remarkEon · on June 22, 2021

Maybe I'm misremembering but I thought the point of the thought experiment was that the Basilisk builds itself?

sudosysgen · on June 23, 2021

No, that would be impossible. The idea is that a future AI is built with the goal of [something good] and discovers self preservation and then does the torture stuff.

a1369209993 · on June 23, 2021

> a future AI is built with the goal of [something good]

Er, no, the idea is that someone hypothesizes the (malicious) AI, and then is compelled to (intentionally) build it by threat of being tortured if anyone else builds and they did not. The AI is working as designed.

See also prisoner's dilemma and tragedy of the commons; Roko's Basilisk is only concerning because of the reasoning that someone else will ruin things for everyone, so you had better ruin things first.

wizzwizz4 · on June 23, 2021

No, that's a version of the Basilisk that makes sense (almost – you don't need an AI for that). The original formulation was that the AI, built with the goal of [something good], would decide to torture people who didn't help build it so the threat of torture encouraged people-in-the-past to build it. (Yes, this is as nonsensical as it sounds; such acausal threats only work in specific scenarios and this isn't one of them.)

But yes, even if the Basilisk could make the threat credible (perhaps with a time machine), your strategy would still work. You can't be blackmailed by something that doesn't exist yet unless you want to be.

a1369209993 · on June 23, 2021

> The original formulation was [...] would decide to torture people

That formulation is not concerning, except to the extent that all AI is concerning due to the possibility of defects in value alignment.

> You can't be blackmailed by something that doesn't exist yet unless you want to be.

https://www.gwern.net/docs/fiction/2011-yvain-fermiparadox.h...

wizzwizz4 · on June 24, 2021

> Unless...and here 9-tsiak's agent-modeling systems came online...unless ve could negotiate a conditional surrender.

9-tsiak wanted to be. Or, at least, had good reason to risk the possibility.

handoflixue · on June 22, 2021

I think you've got an interesting idea there, but I'm not sure why you'd associate it with Roko's Basilisk given that people who are aware tend not to take it very seriously. It seems like you'd be better off just presenting your own idea, and maybe gesturing that it was "inspired by other ideas from LessWrong" if you really feel the need.

e40 · on June 22, 2021

I had never heard about this. Interesting. Sounds like the AI in A Fire Upon the Deep. Awesome hard SciFi by Vernor Vinge.

lostmsu · on June 23, 2021

That was the plot for "The Inquisitor" episode of Red Dwarf.