I think you should tighten up your goals - state specific benchmarks as goals:
- bring the size down - you're more than 90% smaller than original, where do you stop? the low-hanging branches have been pruned via jdupes, looks like trying to slim down MPQ redundancy will require more compute/runtime. How much are you willing to allow to get an extra few percent?
- easy sharable format - BTRFS - doesn't sound like an easy sharable format, especially the hardlinks. How will you share it? Even several tens\hundreds of GBs can be expensive if people are downloading it often.
- what's your definition of a lower end machine? SSD? or spinning rust hard disk allowed? How much RAM? What minimum CPU? Linux\Unix only?
Wouldn't a WoW/Blizzard-programming-specific forum be a more relevant community to get help?
Looking at the docs for the MPQ file format, it looks like recreating exact copies of every MPQ file may be more work than its worth - space-savings-wise.
As you suggest, carving out the game-specific art assets (probably the cut-scene/cinematics take up much of the space), replace with zeros or some other easily compressible data, and compressing the remaining MPQ husk for archiving will save you much with relatively little work/time.
Uncompress the archived MPQ husk and fill in with the assets to get back to square one.
Then you'd just dedupe the art assets.
So, some tool/script which can tell you the byte range within the MPQ file for each of those assets will get you much of the way.
- bring the size down - you're more than 90% smaller than original, where do you stop? the low-hanging branches have been pruned via jdupes, looks like trying to slim down MPQ redundancy will require more compute/runtime. How much are you willing to allow to get an extra few percent?
- easy sharable format - BTRFS - doesn't sound like an easy sharable format, especially the hardlinks. How will you share it? Even several tens\hundreds of GBs can be expensive if people are downloading it often.
- what's your definition of a lower end machine? SSD? or spinning rust hard disk allowed? How much RAM? What minimum CPU? Linux\Unix only?