this is either a very big coincidence, or you are in the datamining discord as well.
the original archive i base my project on uses RMAN to store everything :D
---
thanks for the hint about the FICLONERANGE ioctl... it seems to be fine grained enough to allow me deduplicate on arbitrary offsets, not just whole blocks.
will give it a go.
i tried FICLONERANGE via a python wrapper btw - it turns out, that i can only clone ranges aligned to block boundaries :(
BTRFS is very neat per se, but documentation and help (most of all in very niche cases like this one here lol) is not that easy to come by.
my plan would be to properly process the data set, and then make it available as a BTRFS snapshot... you can export btrfs send as a file as well for storage etc.
if all my tries to use BTRFS fail, i might to write my own tooling and virtual filesytem as well, but optimized for my use case (MPQ files and such).
thanks for your input so far.