Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Quite simply, you’re completely wrong. Modern tesseract versions include a modern LSTM AI. It can very affordably be deployed on CPU, yet its performance is competitive with much more expensive large GPU-based models. Especially if you handle a high volume of scans, chances are that tesseract will have the best bang per buck.



My company probably spent close to 6 figures overall creating Tesseract 5 custom models for various languages. Surya beats them all and is open source (and quite faster).


Surya weights for the models are licensed cc-by-nc-sa-4.0. They have an exception for small companies. If you're company is not small you either need to pay them or use them illegally.

Their training code and data is closed source. They are barely open weight and only inference is open source.


i remember that you could not train it your self in a font like you could in older versions, it that still the case?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: