> In texts transcribed so far, a full one-third of the words contained one or more typos, places where the OCR guessed the wrong letter. [...] Still, the software got 96 percent of all handwritten letters correct.
96% correct sounds pretty good but that's still multiple errors per sentence! The threshold for truly "error-free" is quite high...
I would also be curious about the kind of errors: is it gibberish which is obvious to the reader or selecting a valid word which requires more understanding to recognize as incorrect?
In my experience OCRing text, even a high error rate does not impair the readability of the text due to the high redundancy of english. It's just annoying (and impairs search-ability, of course).
It definitely slows reading since you have to think about alternatives. I was curious since I’ve noticed that many ML systems produce errors which are different than past systems and many places aren’t expecting that in their quality review process.
For example, I’ve built web systems which display search results using an image overlay so you never see the OCR gremlins but if the failure produces valid words rather than gibberish that means an increase in the false positive rate and can be confusing for the user.
> In texts transcribed so far, a full one-third of the words contained one or more typos, places where the OCR guessed the wrong letter. [...] Still, the software got 96 percent of all handwritten letters correct.
96% correct sounds pretty good but that's still multiple errors per sentence! The threshold for truly "error-free" is quite high...