Could this become an attack vector somehow? The greatest minds could probably fi...

Retr0id · 2024-12-30T17:00:53 1735578053

It's lossless, at worst you'd make the compression ratio worse for certain inputs.

deadbabe · 2024-12-30T17:11:08 1735578668

With LLM based compression, could we get something like the opposite of lossless, like hallucinatory? All the original content, plus more?

ychen306 · 2024-12-30T18:18:57 1735582737

How this works is the LLM predicts the probability of the next token and then an arithmetic coder turns that probability distribution into bits. So it will never hallucinate. In the worst case, when the LLM makes an outrageous prediction, you just use more bits, but it doesn't affect correctness.

Retr0id · 2024-12-30T17:21:29 1735579289

Not if the compression scheme is lossless, which it is here, per my previous comment.

tshaddox · 2024-12-30T17:25:11 1735579511

Presuming the software is implemented correctly, that can't happen (per the definition of "lossless"). I can imagine this happening with a careless implementation, e.g. if circumstances conspire to allow a slightly different version or configuration of the LLM to be used across compression and decompression.

semiquaver · 2024-12-30T17:31:39 1735579899

LLMs are deterministic at zero temperature.

gloflo · 2024-12-31T08:28:43 1735633723

Deterministically lossy/hallucinatory.