I came up with a similarly impressive compression scheme as a young teen, shortly after I started programming.
It was beautiful in its simplicity. Take 5 bytes, compute a 4-byte checksum, and just store the checksum. After all the chances of a checksum collision is miniscule.
When decompressing just iterate over all 5-byte values until you get the correct checksum.
The fantastic feature was of course that you could apply this recursively if you needed higher compression ratios.
Took me a good hour or so before I caught up with reality.
DOS International was a German magazine in the early nineties with excellent technical explanations and program listings. One of them was a super compressor that used your recursive technique. I didn't quite understand the details that were given in the description of the algorithm (I blamed my mediocre understanding of German), but it sounded legit.
So I spent a good hour to type in a page of impossible to follow C code with obscure numbers in lookup tables and all it did when I ran the program was to print out "April 1., April 1.".
I had another hairbrained idea once. Let's say you want to compress the number 5384615385. Find two numbers x and y so that x/y equals some number whose decimal part is 5384615385. In this case, that's 7/13. So the compressed output is just 7 and 13, which could be encoded very succinctly, thus saving lots of storage space; and decompression is just multiplying numbers.
Unfortunately, while the idea works for some input sequences, most numbers aren't rational, and most sequences would require numerators/denominators that would be larger than the input. So not practically feasible.
A variation on this would work, but alas we don't have the math tools as yet.
The idea (not mine) is that you can think of data as "very large numbers". So a 4096 bit number is just a big number.
Well, we have a short way to generate big numbers. x^y.
So given a big number (say 800 000 bits) we could generate a (Hopefully short) expression of the form a^b + (or minus) c^d + ... etc.
Unfortunately the "factorizing" (and indeed ecpansion) of a large number in this way can't (currently) be done quickly.
But in concept enormously large binary files could be compressed to tiny strings.
It would not work for arbitrary inputs. See the pigeonhole principal: you can’t represent all possible n-bit values with fewer than n bits, because otherwise you’d have at least one case where multiple input values map to the same output value. On decompression, which one do you go with?
And if that’s not convincing, then consider that any perfect compression scheme would be able to compress its own output even smaller, until you end up with a single bit output for any possible input.
So no, that wouldn’t work in general. Some specific values may compress well, but most others can’t. It’s not a matter of difficulty of finding the right answers, as much as you probably can’t do it.
To add to this, useful compression algorithms exploit patterns in the input data. A common pattern to target is repetition, since most files contain lots of repeated byte sequences.
The pattern that factorization targets are numbers that factor well. I doubt this is a pattern you’d find in any file worth compressing, it doesn’t have a clear relationship to file data.
Aside from the factorizing cost, I suspect decompression will also be prohibitively slow. Exponentiation, multiplication and addition such large range numbers could still end up prohibitively expensive.
In addition to the performance problems there's the basic fact that no such scheme can possibly work (as proven by the pigeonhole principle).
High compression rate schemes that actually work compress high likelihood inputs and expand low likelihood inputs by accounting for the characteristics that make inputs high likelihood--e.g., redundant highly patterned texts. Schemes that are agnostic about the input, like the one described here, are as likely to expand any given input as they are to compress it.
It was beautiful in its simplicity. Take 5 bytes, compute a 4-byte checksum, and just store the checksum. After all the chances of a checksum collision is miniscule.
When decompressing just iterate over all 5-byte values until you get the correct checksum.
The fantastic feature was of course that you could apply this recursively if you needed higher compression ratios.
Took me a good hour or so before I caught up with reality.