Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

it's simpler: the hutter prize imposes a 110M constraint on the sum of the sizes of your program (including any data it needs to run) and the compressed data

llms are generally large



This could be circumvented by _training_ the LLM on the fly on the previously observed file data. This is what Bellard's other NN compressor, nncp, does [1], which is currently #1 on Mahoney's benchmark [2]. Unfortunately this is too slow, especially running on the CPU as Hutter's challenge stipulates IIRC.

[1] https://bellard.org/nncp/

[2] http://mattmahoney.net/dc/text.html


In fact, pretty much every adaptive compression algorithm does. The eventual compression ratio would thus be determined by the algorithm (nncp, cmix, ...; also includes smaller tweaks like those typically made by the Hutter Prize winners) and its hyperparameters.


Yes, the only exception is dictionaries used in preprocessing, but I think that's mostly a tradeoff to reduce the runtime.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: