LLMs can have secrets if they were scraped in the training data.
And how do we know definitively what is done with chat logs? The LLM model is a black box for OpenAI (they don't know what was learned or why it was learned), and OpenAI is a black box for users (we don't know what data they collect or how they use it).
You can probe what was learned if you have access to the model; it'll tell you, especially if you do it before applying the safety features.
A good heuristic for whether they would train user chats into the model is whether this makes any sense. But it doesn't; it's not valuable. They could be saying anything in there, it's likely private, and it's probably not truthful information.
Presumably they do do something with responses you've marked thumbs up/thumbs down to, but there are ways of using those that aren't directly putting them in the training data. After all, that feedback isn't trustworthy either.
> You can probe what was learned if you have access to the model; it'll tell you, especially if you do it before applying the safety features.
Does that involve actually parsing the data itself, or effectively asking the model questions to see what was learned?
If the data model itself can be parsed and analyzed directly by humans that is better than I realized. If its abstracted through an interpreter (I'm sure my terminology is off here) similar to the final GPT product then we still can't really see what was learned.
By probe, I mean observe the internal activations. There are methods that can suggest if it's hallucinating or not, and ones that can delete individual pieces of knowledge from the model.
And how do we know definitively what is done with chat logs? The LLM model is a black box for OpenAI (they don't know what was learned or why it was learned), and OpenAI is a black box for users (we don't know what data they collect or how they use it).