It seems that the filesystem-level deduplication approach with BTRFS conflicts w...

DrFrugal · 2024-12-18T11:36:42 1734521802

a BTRFS subvolume can be exported as a snapshot, and this can be passed around rather easily. i know the concern mounting BTRFS, but there are windows drivers for that as well, and you can also mount it via WSL nowadays, to have proper linux tooling.

the approach of having a custom storage blob with pointer references which is something i consider as well, will play around with that during the holidays and do some experimenting. thanks for your input.

canucker2016 · 2024-12-22T20:01:36 1734897696

Definitely document\run-through what your intended audience will have to do to get your data into a state usable for them.

It also lets the user know how long this will take.

hamandcheese · 2024-12-21T20:09:49 1734811789

Why a content-addressable blob rather than just a directory with entries named after the SHA (which you can later tar up if desired)?

rgovostes · 2024-12-22T03:22:39 1734837759

That would work as well. I'm not sure if extraction of a file by filename from a compressed tarball is particularly efficient or not.

hamandcheese · 2024-12-22T07:03:55 1734851035

Ahh I see. If fast random access is a requirement that makes things a bit more tricky, yes.

nolist_policy · 2024-12-19T09:18:20 1734599900

You just described git-annex.

rgovostes · 2024-12-19T20:07:05 1734638825

Basically I described how any content-addressable storage system works (including git), with the complication that our content is spread across various archive files, MPQ here but just as applicable to tarballs or anything else.

The difference between my suggestion and how BTRFS dedupes is that the FS does it on a block basis, which might work better for some kinds of files, but for game assets I think doing it on a file basis is good enough if not better.