Co-founder of doctly.ai here (OCR tool) I love mistral and what they do. I got r...

bambax · 2025-03-06T19:28:03 1741289283

Where did you test it? At the end of the post they say:

> Mistral OCR capabilities are free to try on le Chat

but when asked, Le Chat responds:

> can you do ocr?

> I don't have the capability to perform Optical Character Recognition (OCR) directly. However, if you have an image with text that you need to extract, you can describe the text or provide details, and I can help you with any information or analysis related to that text. If you need OCR functionality, you might need to use a specialized tool or service designed for that purpose.

Edit: Tried anyway by attaching an image; it said it could do OCR and then output... completely random text that had absolutely nothing to do with the text in the image!... Concerning.

Tried again with a better definition image, output only the first twenty words or so of the page.

Did you try using the API?

kapitalx · 2025-03-06T19:41:05 1741290065

Yes I used the API. They have examples here:

https://docs.mistral.ai/capabilities/document/

I used base64 encoding of the image of the pdf page. The output was an object that has the markdown, and coordinates for the images:

[OCRPageObject(index=0, markdown='![img-0.jpeg](img-0.jpeg)', images=[OCRImageObject(id='img-0.jpeg', top_left_x=140, top_left_y=65, bottom_right_x=2136, bottom_right_y=1635, image_base64=None)], dimensions=OCRPageDimensions(dpi=200, height=1778, width=2300))] model='mistral-ocr-2503-completion' usage_info=OCRUsageInfo(pages_processed=1, doc_size_bytes=634209)

sadcrab · 2025-03-07T06:01:56 1741327316

Any luck with this? I'm trying to process photos of paperwork (.pdf, .png) and got the same results as you.

Feels like something is missing in the docs, or the API itself.

https://imgur.com/a/1J9bkml

fnordpiglet · 2025-03-06T19:02:05 1741287725

Interestingly I’m currently going through and scanning the hundreds of journal papers my grandfather authored in medicine and thinking through what to do about graphs. I was expecting to do some form of multiphase agent based generation of LaTeX or SVG rather than a verbal summary of the graphs. At least in his generation of authorship his papers clearly explained the graphs already. I was pretty excited to see your post naturally but when I looked at the examples what I saw was, effectively, a more verbose form of

``` ![img-0.jpeg](img-0.jpeg) ```

I’m assuming this is partially because your use case is targeting RAG under various assumptions bur also partially because multimodal models aren’t near what I would need to be successful with?

kapitalx · 2025-03-06T19:09:20 1741288160

We need to update the examples on the front page. Currently for things that are considered charts/graphs/figures we convert to a description. For things like logos or images we do an image tag. You can also choose to exclude them.

The difference with this is that it took the entire page as an image tag (it's just a table of text in my document). rather than being more selective.

I do like that they give you coordinates for the images though, we need to do something like that.

Give the actual tool a try. Would love to get your feedback for that use case. It gives you 100 free credits initially but if you email me ([email protected]), I can give you an extra 500 (goes for anyone else here also)

niwtsol · 2025-03-06T19:03:42 1741287822

If you have a judge system, and Mistral performs well on other tests, wouldn't you want to include it so if it scores the highest by your judges ranking it would select the most accurate result? Or are you saying that mistral's image markdown would score higher on your judge score?

kapitalx · 2025-03-06T19:17:57 1741288677

We'll definitely be doing more tests, but the results I got on the complex tests would result in a lower score and might not be worth the extra cost of the judgement itself.

In our current setup Gemini wins most often. We enter multiple generations from each model into the 'tournament', sometimes one generation from gemini could be at the top while another in the bottom, for the same tournament.

Grosvenor · 2025-03-06T19:39:21 1741289961

Does doctly do handwritten forms like dates?

I have a lot of "This document filed and registered in the county of ______ on ______ of _____ 2023" sort of thing.

kapitalx · 2025-03-06T19:44:23 1741290263

We've been getting great results with those aswell. But ofcourse there is always some chance of not getting it perfect, specially with different handwritings.

Give it a try, no credit cards needed to try it. If you email me ([email protected]) i can give you extra free credits for testing.

Grosvenor · 2025-03-06T20:29:39 1741292979

Just tried it. Got all the dates correct and even extracted signatures really well.

Now to figure out how many millions of pages I have.

scottydelta · 2025-03-07T02:21:25 1741314085

How do you stay competitive with $2/100 pages pricing as compared to mistral and others offering 1000 pages for $1 approx?

kapitalx · 2025-03-09T07:59:26 1741507166

Customers are willing to pay for accuracy compared to existing solutions out there. We started out in need of an accurate solution for a RAG product we were building, but none of the solutions we tried were providing the accuracy we needed.

infecto · 2025-03-06T18:47:49 1741286869

Why pay more for doctly than an AWS Textract?

nnurmanov · 2025-03-15T06:31:41 1742020301

I did not try doctly, but AWS Textract does not support in my case Russian, so the output is completely useless

kapitalx · 2025-03-06T19:30:06 1741289406

Great question. The language models are definitely beating the old tools. Take a look at Gemini for example.

Doctly runs a tournament style judge. It will run multiple generations across LLMs and pick the best one. Outperforming single generation and single model.

the_mitsuhiko · 2025-03-06T18:48:18 1741286898

Would love to see the test file.

Starlord2048 · 2025-03-06T18:50:39 1741287039

would be glad to see benchmarking results

kapitalx · 2025-03-06T19:03:22 1741287802

This is a good idea. We should publish a benchmark results/comparison.