even though I do not like it too much, i think i will have to pry apart the MPQ files into it's distinct parts, and write them down to the filesystem individually (and then deleting the original file) - basically what i wanted to do with the extents, but instead have them as distinct files.
this then can be reversed via script to assemble the original archive file on demand to get a byte-by-byte equal file again.
writing the parts down to the filesystem will cause the parts to be properly block aligned, and be able to be hardlinked, if they exist multiple times on the filesystem - this cuts down on metadata even more and also boosts performance when doing block/extent deduplication, since a single inode is only processed once in most proper deduplication programs.
the MPQ files range from a few MiB to around 2.5 GiB.
since the access should be rather fast, pairing them as an archive file is not an option for me.
thanks about the hint of the merkle trees, i will read up on what that is... always good to know about different approaches to a problem :)
I'm not sure about how it works in WoW but I can give you a warning about what happens in StarCraft. To prevent other people from editing StarCraft maps (which are MPQs), users would intentionally mangle the MPQ format in just the right way so that the game could still play it but other tools could not open it for editing. So, if there is anything like that going on in WoW world then it might be very hard to reassemble the original MPQs and get a byte for byte match.
shouldn't be the case, and i would implement a proper verify when exploding the MPQ file, by running a reassamble and hash comparison right afterwards.
the MPQ files range from a few MiB to around 2.5 GiB. since the access should be rather fast, pairing them as an archive file is not an option for me.
thanks about the hint of the merkle trees, i will read up on what that is... always good to know about different approaches to a problem :)