Yeah, that's why I don't think there's an easy fix for this. A lot of talented, ...

cubefox · on May 13, 2023

Bing Chat, the first model to use external content in its context, was only released three months ago. Microsoft is also generally not very good at fine-tuning, as we have seen with their heavy reliance on using an elaborate custom prompt instead of more extensive fine-tuning. And OpenAI has released their browsing plugin only recently. So this is not a lot of time really.

I know Bing Chat talks like a pirate when it reads a compromising website, but I'm not sure the ChatGPT browsing plugin has even been shown to be vulnerable to prompt injection. Perhaps they have already fixed it? In any case, I don't think there is a big obstacle.

danShumway · on May 14, 2023

> but I'm not sure the ChatGPT browsing plugin has even been shown to be vulnerable to prompt injection

https://embracethered.com/blog/posts/2023/chatgpt-plugin-you... was posted in a Discord group I'm a part of this morning, demonstrating indirect prompt injection working in a ChatGPT plugin.

I see a lot of responses when talking about prompt injection where people keep asking, "okay, but is this new thing vulnerable?" And then eventually it's shown to be vulnerable, and then they just move on to the next new thing. Like, I already know the response here is going to be "okay, but are specifically ChatGPT-4 plugins vulnerable?" At this point, the answer is yes until the answer is demonstrated to be no -- at the very least, the answer is yes until a platform can last more than a month or two without seeing a prompt injection attack succeed.

This is guess-test-and-revise security, it is not how we should be approaching the problem; and after a while the conclusion has to be that there is something fundamental going wrong and that it's going to keep going wrong until something fundamental changes. If GPT-5 comes out and it's specifically trained with a new strategy, then fine, that's interesting to talk about. But do we need to have the same conversation every single time an incremental improvement happens with a model?

Assuming that models are secure by default until proven otherwise is not a feasible strategy anymore.

cubefox · on May 14, 2023

Okay, this doesn't look as if they have done anything similar to what I proposed. Although the plugin (VoxScript) is not from OpenAI proper, they would be able to use quote tokens, if OpenAI provided them. Maybe implementing this is too much work currently relative to how big they perceive the problem to be.

simonw · on May 13, 2023

Yeah, that's a good call on ChatGPT browsing mode - it's likely to be exhibiting the absolute best defenses OpenAI have managed to out together to far.

My hunch is that it's still exploitable, but if not it would be very interesting to hear how they have protected it.