yeah that's a fun challenge — what we've seen work well is a system that forces ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		kbyatnal 10 months ago \| parent \| context \| favorite \| on: Mistral OCR yeah that's a fun challenge — what we've seen work well is a system that forces the LLM to generate citations for all extracted data, map that back to the original OCR content, and then generate bounding boxes that way. Tons of edge cases for sure that we've built a suite of heuristics for over time, but overall works really well.

dontlikeyoueith 10 months ago [–]

Why would you do this and not use Textract?

schcrosby 10 months ago | [–]

I too have this question.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact