Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would also be curious about the kind of errors: is it gibberish which is obvious to the reader or selecting a valid word which requires more understanding to recognize as incorrect?


In my experience OCRing text, even a high error rate does not impair the readability of the text due to the high redundancy of english. It's just annoying (and impairs search-ability, of course).


It definitely slows reading since you have to think about alternatives. I was curious since I’ve noticed that many ML systems produce errors which are different than past systems and many places aren’t expecting that in their quality review process.

For example, I’ve built web systems which display search results using an image overlay so you never see the OCR gremlins but if the failure produces valid words rather than gibberish that means an increase in the false positive rate and can be confusing for the user.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: