The 'uninteresting' metadetails are what gets people to reach the interesting details so your first instinct was probably right but I don't think it really matters much in your case as you have enough local reputation to just blobpost plus you're around to talk about the stuff in either form.
It's a modified BuzHash with the following changes:
- Substitution table is pseudorandomly permuted (NB: like Borg).
- Initial 32-bit state is derived from key.
- Window size slightly varies depending on key (by ~1/4).
- Digest is scrambled with a 32-bit block cipher.
I also proposed adding (unspecified) padding before encrypting chunks to further complicate discovering their plaintext lengths. Glad to see I was on the right track :)
Could this be mitigated by randomising the block upload order?
A fresh backup will be uploading thousands of blocks. You don't want to create all the blocks before uploading, but a buffer of a hundred might be enough?
> The reason rolling hashes are relevant to our topic is that many file backup services use variations
of rolling hashes to achieve CDC. This paper will primarily look at Tarsnap [9], a project by the
second author, but we will also look at other schemes such as Borg [2] and Restic [6]
> It seems like compression as default (or even required) is important. Without compression, Borg
and Restic are susceptible to known plaintext attacks. With compression, we still have theoretically
sound (and harder) chosen-plaintext attacks but no known-plaintext attacks. Sadly, compression
can also leak information for post-parameter extraction attacks, as shown in Example 4.3.
My reading is that the primary vector is based on the size of the chunks (due to deterministic chunking and length-preserving encryption). Would padding chunks with random-length data (prior to encryption) help mitigate this at the cost of additional storage (and complexity)?
borg supports the "obfuscate" pseudo compressor since 1.2.0 (since Feb 2022), that adds random extra length to the chunks, using one of 2 different algorithms to determine the extra length.