It looks like you are basically outlining the solution with the description of the file format.
If it is { header, files[], hash_table[], block_table[] } then it would mean that you likely can de-duplicate based on the block_table or the hash_table (not familiar with the format itself). It may be prudent to check how many MPQ files have the same blocks inside them (checksumming or hashing, comparing on multiple checksums). If there is some overlap and some blocks are present in multiple files, you can extract those blocks and allow any version of the file be reconstituted back into an MPQ file - as long as you know which blocks to combine and in which order.
If it is { header, files[], hash_table[], block_table[] } then it would mean that you likely can de-duplicate based on the block_table or the hash_table (not familiar with the format itself). It may be prudent to check how many MPQ files have the same blocks inside them (checksumming or hashing, comparing on multiple checksums). If there is some overlap and some blocks are present in multiple files, you can extract those blocks and allow any version of the file be reconstituted back into an MPQ file - as long as you know which blocks to combine and in which order.