Hacker Newsnew | past | comments | ask | show | jobs | submit | ritvikpandey21's commentslogin

DeepSeek AI just released DeepSeek-OCR, a new open-source model that aims to rethink text extraction through what it calls Context Optical Compression. The launch quickly caught attention on X and GitHub, with many celebrating another big step in open document AI.

At Pulse, we were curious how it performs on the kinds of messy, high-density documents that power real business workflows. So we ran DeepSeek-OCR through our standard evaluation suite: multi-page PDFs, handwritten forms, nested tables, and scanned statements. The results were promising in theory but inconsistent in practice.


interesting read


We processed hundreds of millions of pages and found that a single accuracy metric is misleading. A model that's 98% accurate on 1,000 pages with 200 data elements each still produces 4,000 incorrect values. The real killers are broken reading order in multi-column layouts, shifted table columns, and lost cross-page context that silently corrupt datasets without throwing errors.


We evaluated ByteDance's Dolphin document parsing model on enterprise document processing tasks using standardized benchmarks and real-world document sets. Our testing dataset included 847 financial documents, 312 legal forms, and 156 academia research publications to assess performance across critical enterprise use cases.


After processing nearly 500 million pages of enterprise documents, we've discovered that the biggest challenge in document AI isn't character recognition or table extraction. It's something far more fundamental: understanding how information flows across page breaks, column boundaries, and interrupted sections.


curious how LLM hallucinations will work on logging info - gonna be a hard problem to solve


I assume this is most in regards to our anomaly/error detection! Deterministic rules for flagging anomalies + human feedback help in adapting our flagging system accordingly. So hallucinations won't directly impact flagged anomalies. The rules (patterns) we generate are on the stricter end, so they err on the side of flagging more.

However, the rules themselves aren't deterministically generated (and therefore prone to LLM hallucinations). To address this, we currently have a simpler system that lets you mark incorrectly flagged anomalies so they can be incorporated into our generated rules. There's room to improve that we're actively working on: exposing our generated patterns in a human-digestible manner (so they can be corrected), introducing metrics and more data sources for context, and connecting with a codebase.


as builders in this space, we decided to put it to the test on complex nested tables, pie charts, etc. to see if the same VLM hallucination issues persist, and to what degree. while results were promising, we found several critical failure nodes across two document domains.

check out our blog post here! https://www.runpulse.com/blog/beyond-the-hype-real-world-tes...


We put Mistral AI's new OCR to the test against complex documents that matter for real business use cases. While it outperforms a lot of frontier LLMs, we found critical limitations for finance, legal, and healthcare domains.


You're probably better off linking this in the ongoing thread that's sitting on the top of the front page right now.


claude is definitely better than gpt -- but both have their flaws! they pretty much fall flat on their face with nested entries, low-fidelity images, etc. (we detailed this heavily in our blog post here [1])

other ocr providers are doing a great job - we personally believe we have the highest accuracy tool on the market. we're not here to dunk on anyone just provide unbiased feedback when putting new document extraction tools through a challenge.

[1]: https://www.runpulse.com/blog/why-llms-suck-at-ocr


to not make the read extra long, we only included one example. we tried over 50 docs and found a couple with pie charts/bar graphs that weren't parsed at all. there were also a few instances with entire column entires incorrect due to mismatching.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: