In all the open source cases I’m aware of, the roles are just normal text.
The ability to trivially trick the model into thinking it said something it didn’t is a feature and intentional. It’s how you do multi-turn conversations with context.
Since the current crop of LLMs have no memory of their interaction, each follow up message (the back and forth of a conversation) involves sending the entire history back into the model, with the role as a prefix for each participants output/input.
There are some special tokens used (end of sequence, etc).
If your product doesn’t directly expose the underlying model, you can try to prevent users from impersonating responses through obfuscation or the LLM equivalent of prepared statements. The offensive side of prompt injection is currently beating the defensive side, though.
> The ability to trivially trick the model into thinking it said something it didn’t is a feature and intentional.
It is definitely not an intended feature for the end user to be able to trick the model into believing it said something it didn't say. It also doesn't work with ChatGPT or Bing Chat, as far as I can tell. I was talking about the user, not about the developer.
> It’s how you do multi-turn conversations with context.
That can be done with special tokens also. The difference is that the user can't enter those tokens themselves.
> It is definitely not an intended feature for the end user to be able to trick the model into believing it said something it didn't say. It also doesn't work with ChatGPT or Bing Chat, as far as I can tell. I was talking about the user, not about the developer.
Those aren't models, they are applications built on top of models.
> That can be done with special tokens also. The difference is that the user can't enter those tokens themselves.
Sure. But there are no open models that do that, and no indication of whether the various closed models do it either.
> Those aren't models, they are applications built on top of models.
The point holds about the underlying models.
> Sure. But there are no open models that do that, and no indication of whether the various closed models do it either.
An indication that they don't do it would be if they could be easily tricked by the user into assuming they said something which they didn't say. I know no such examples.
Mostly agree. But there is no LLM equivalent of prepared statements available, that's the problem. And I don't think this is necessary to have multi-turn statements. Assuming there's some other technical constraint, because you could otherwise expose a slightly more complex API that took a list of context with metadata rather than a single string and then added the magic tokens around it.
The ability to trivially trick the model into thinking it said something it didn’t is a feature and intentional. It’s how you do multi-turn conversations with context.
Since the current crop of LLMs have no memory of their interaction, each follow up message (the back and forth of a conversation) involves sending the entire history back into the model, with the role as a prefix for each participants output/input.
There are some special tokens used (end of sequence, etc).
If your product doesn’t directly expose the underlying model, you can try to prevent users from impersonating responses through obfuscation or the LLM equivalent of prepared statements. The offensive side of prompt injection is currently beating the defensive side, though.