I used a hash to begin with, using a simplistic digest function to the message headers I was comparing, getting me a 4-byte hash key.
That worked, but was kind of slow.
Finally, the idea came to me to not apply _any_ digesting, and use the combined concatenated headers of the messages as hash keys. About 2k bytes per hash key!
The result: About 20x perf improvement if memory serves.
How is that possible? The reason is that the code all runs in a Javascript machine; and applying the digest was not a built-in function, it was looping over the headers and doing the arithmetic. Thousands upon thousands of JS abstract machine steps. The use of the large hash key may be inefficient, but - it's just one JS object / dictionary operation, and one of the most heavily-optimized in any implementation.
https://addons.thunderbird.net/en-US/thunderbird/addon/remov...
I used a hash to begin with, using a simplistic digest function to the message headers I was comparing, getting me a 4-byte hash key.
That worked, but was kind of slow.
Finally, the idea came to me to not apply _any_ digesting, and use the combined concatenated headers of the messages as hash keys. About 2k bytes per hash key!
The result: About 20x perf improvement if memory serves.
How is that possible? The reason is that the code all runs in a Javascript machine; and applying the digest was not a built-in function, it was looping over the headers and doing the arithmetic. Thousands upon thousands of JS abstract machine steps. The use of the large hash key may be inefficient, but - it's just one JS object / dictionary operation, and one of the most heavily-optimized in any implementation.