I came up with a similarly impressive compression scheme as a young teen, shortl...

tverbeure · 2025-09-30T21:53:39 1759269219

DOS International was a German magazine in the early nineties with excellent technical explanations and program listings. One of them was a super compressor that used your recursive technique. I didn't quite understand the details that were given in the description of the algorithm (I blamed my mediocre understanding of German), but it sounded legit.

So I spent a good hour to type in a page of impossible to follow C code with obscure numbers in lookup tables and all it did when I ran the program was to print out "April 1., April 1.".

atombender · 2025-09-30T22:36:43 1759271803

I had another hairbrained idea once. Let's say you want to compress the number 5384615385. Find two numbers x and y so that x/y equals some number whose decimal part is 5384615385. In this case, that's 7/13. So the compressed output is just 7 and 13, which could be encoded very succinctly, thus saving lots of storage space; and decompression is just multiplying numbers.

Unfortunately, while the idea works for some input sequences, most numbers aren't rational, and most sequences would require numerators/denominators that would be larger than the input. So not practically feasible.

bruce511 · 2025-10-01T00:51:19 1759279879

A variation on this would work, but alas we don't have the math tools as yet.

The idea (not mine) is that you can think of data as "very large numbers". So a 4096 bit number is just a big number.

Well, we have a short way to generate big numbers. x^y. So given a big number (say 800 000 bits) we could generate a (Hopefully short) expression of the form a^b + (or minus) c^d + ... etc.

Unfortunately the "factorizing" (and indeed ecpansion) of a large number in this way can't (currently) be done quickly.

But in concept enormously large binary files could be compressed to tiny strings.

kstrauser · 2025-10-01T02:51:33 1759287093

It would not work for arbitrary inputs. See the pigeonhole principal: you can’t represent all possible n-bit values with fewer than n bits, because otherwise you’d have at least one case where multiple input values map to the same output value. On decompression, which one do you go with?

And if that’s not convincing, then consider that any perfect compression scheme would be able to compress its own output even smaller, until you end up with a single bit output for any possible input.

So no, that wouldn’t work in general. Some specific values may compress well, but most others can’t. It’s not a matter of difficulty of finding the right answers, as much as you probably can’t do it.

pverheggen · 2025-10-01T05:43:12 1759297392

To add to this, useful compression algorithms exploit patterns in the input data. A common pattern to target is repetition, since most files contain lots of repeated byte sequences.

The pattern that factorization targets are numbers that factor well. I doubt this is a pattern you’d find in any file worth compressing, it doesn’t have a clear relationship to file data.

kstrauser · 2025-10-01T13:50:40 1759326640

Amendment: autocorrect changed “provably can’t do it” to “probably” above. It’s not probable. It’s certain.

vlovich123 · 2025-10-01T04:01:00 1759291260

Aside from the factorizing cost, I suspect decompression will also be prohibitively slow. Exponentiation, multiplication and addition such large range numbers could still end up prohibitively expensive.

jibal · 2025-10-01T06:33:22 1759300402

In addition to the performance problems there's the basic fact that no such scheme can possibly work (as proven by the pigeonhole principle).

High compression rate schemes that actually work compress high likelihood inputs and expand low likelihood inputs by accounting for the characteristics that make inputs high likelihood--e.g., redundant highly patterned texts. Schemes that are agnostic about the input, like the one described here, are as likely to expand any given input as they are to compress it.

vlovich123 · 2025-10-01T18:20:14 1759342814

That’s correct. I’m a little sleep deprived at the moment but that’s definitely the mathematical reason it’s not even feasible to begin with.

titanomachy · 2025-10-01T00:57:30 1759280250

harebrained. like a rabbit

dboreham · 2025-10-01T01:38:36 1759282716

I think we called these "fairy cake" schemes, after the Douglas Adams bit.