Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Are there any AI-enhanced solutions for converting PDFs to ePub?
7 points by 2big2fail_47 on Sept 11, 2023 | hide | past | favorite | 5 comments
I read numerous scholarly articles and papers, primarily in PDF format and occasionally in scanned book form. I'm curious if anyone here is aware of an OCR solution, possibly utilizing AI enhancements, to convert these documents into ePub format? I've found that tools like Calibre often struggle with preserving the formatting. It strikes me that the potential of ePub and E-ink readers is so underestimated in the whole software universe.


Try https://vectorizer.ai/ it's main purpose is to convert images (rasters) into vectors (SVG). But it has great line detection and color acquisition, give it a go. Bare in mind that this will only work for images, so you will first need to convert the png into an image.

I know that it seems an odd suggestion, but I had to try different vectorization methods for a project of mine and this one was performing very well with low resolution images. Hope it helps.


Thanks. That sounds like a nice webapp. In my case it's easier to copy the text from the Mac Preview. Since M1 it does OCR quite well. This way I don't have to convert the pdf to png. Still i'd love a fully automated way.


What are the issues you're having with Calibre? I've found it to preserve the formatting I'm looking for. What formatting are you talking about that it's not preserving?


When the PDF is very clearly structured it's working just fine. But let's say the layout consists of multiple columns and complex formatting the output gets very imprecise. If the material is scanned it won't function at all.


That is really hard, as there are no such things as columns in PDFs, only text starting at different (x,y) positions.

Hence most (if not all) programs export the text in the order they appear in the file.

And if it is scanned, there is no text at all (but you could OCR it).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: