Not a good argument, because this type of AI is basically exactly a compression algorithm - it’s the most efficient way we know to compress 2 billion image/text pairs down such that it can recreate the average of all the inputs retaining as much layered knowledge as it can.
It’s a very very fancy compression algorithm, but it still is one.
There is an important distinction between lossy compression and lossless compression. My information-theoretic argument shows that models couldn't have been losslessly compressed, and the juice analogy implies (but does not show) that there is no obvious way to "decompress" the original data set. I intended that, if both arguments are true, those models can't be a photocopier because they don't contain copiable works at all.
It’s a very very fancy compression algorithm, but it still is one.