> but have no architectural mechanism to separate facts from expressions
Sure they do. Every time a bot searches, reads your site and formulates an answer it does not replicate your expression. First of all, it compares across 20.. 100 sources. Second, it only reports what is related to the user query. And third - it uses its own expression. It's more like asking a friend who read those articles and getting an answer.
LLMs ability to separate facts from expression is quite well developed, maybe their strongest skill. They can translate, paraphrase, summarize, or reword forever.
This is a baseless assertion of emergent behaviour.
> Every time a bot searches
We are talking about LLMs by themselves, not larger systems using them.
> LLMs ability to separate facts from expression is quite well developed
It is not. Whether you ask an LLM for an excerpt of the bible, or an excerpt of The Lord of the Rings, the LLM does not distinguish. It has no concept of what is, and what is not, under copyright.
I don't see how that follows. It can learn a false "fact" while not retaining the way that statement was expressed. It can also just make up facts entirely, which by definition then did not come from any training data.
Sure they do. Every time a bot searches, reads your site and formulates an answer it does not replicate your expression. First of all, it compares across 20.. 100 sources. Second, it only reports what is related to the user query. And third - it uses its own expression. It's more like asking a friend who read those articles and getting an answer.
LLMs ability to separate facts from expression is quite well developed, maybe their strongest skill. They can translate, paraphrase, summarize, or reword forever.