Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What do you use for OCR, I wonder

if it's LLM isn't that too expensive (or flaky if cheap) to be scalable?



I was using an OCR service but I have migrated to use LLM recently. It is actually cheaper and more scalable because it is accurate enough and I can also get more information (guess currency and timezone, and translate, and even help tag the receipts) with a single prompt. Currently it sits just a tad bit cheaper than my previous OCR service per receipt scanned. Since I am providing this as an app instead of as an automation, there is UI for user to make edits and they seem to be happy enough about the results with very minimal need to edit.

I have also been playing with open source VLMs like Qwen3-VL and it was surprisingly not too far behind. I think the next generation I should be able to switch over and save a bit more on cost.


Thanks for elaborating! Btw, what model is currently cheap yet accurate enough for you?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: