it's simpler: the hutter prize imposes a 110M constraint on the sum of the sizes of your program (including any data it needs to run) and the compressed data
This could be circumvented by _training_ the LLM on the fly on the previously observed file data. This is what Bellard's other NN compressor, nncp, does [1], which is currently #1 on Mahoney's benchmark [2]. Unfortunately this is too slow, especially running on the CPU as Hutter's challenge stipulates IIRC.
In fact, pretty much every adaptive compression algorithm does. The eventual compression ratio would thus be determined by the algorithm (nncp, cmix, ...; also includes smaller tweaks like those typically made by the Hutter Prize winners) and its hyperparameters.
llms are generally large