Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: How to digitize my relatives paper books on a small budget
1 point by ThinkBeat on Feb 8, 2022 | hide | past | favorite | 4 comments
My great grandfather was an author, and he wrote quite a few books.

He was never famous or successful, but he made a living as an author.

I have most of his books that have been handed down. They are old and fragile.

The books do not exist in any eBook format and I would like to create an eBook of each of his paper books.

These can then be made available free of charge on the net and, be given to libraries if they are at all interested (probably not)

The process must be nondestructive.

Ripping out each page and feeding it through a scanner is probably the fastest and perhaps easiest way of doing it but I will not do that.

I have a flatbed scanner and a document scanner and a cellphone.

I have tried to arrange a stand above the book to keep it at a fixed distance from a camera. I have problems with the pages since they do not lay down flat and a lot depends on how far into the book I go.

The first issue then is developing a process to copy/scan each page. I am stuck here.

Then the next would be compiling them, adjusting scans correcting flaws, maybe re scan.

Then some form of OCR

and then actually making en eBook out of it all.

I can make a regular PDF file but I have never tried to make an epub file, not sure how to do that.

I want the books to be in as good a shape as they were when I started when I am done.

Any tips?



Image processing and OCR is a problem space that interests me and that I have quite a bit of prior work experience in. I can't help you with the scanning itself but I'd be interested in taking a crack at processing the images you get to see if it's possible to get high-quality OCR from the deformed images that you might get from a simpler scanning setup.

Having someone else who could benefit from my work might help with my motivation to pursue this sort of project, which I've been having a bit of trouble with lately. If you'd like to discuss this further my e-mail is in my public profile.


What an excellent project! I would have thought that flattening the books enough to scan them will damage or destroy them. Google developed some software for scanning curved pages (so as not to damage fragile books) and then flattening the images, but I don't know if that is publicly available. Have you thought about retyping, or having them retyped, as a solution? You could type from photographs of the pages (the page curve wouldn't matter then, so long as the text is legible.) A 150,000 word book might cost you $250 to have typed via, eg Upwork.(Or 50 hours of your time if you can touch type reasonably.) You could QC yourself. You'd then be in a position to typeset and make an e-pub. (Text/LaTeX/HTML -> e-pub is a fairly trivial process; there are open source and free options.) Hope this helps - good luck!


I always thought the Treventus ScanRobot was an elegant way to scan books:

https://www.youtube.com/watch?v=SdipuAuWsEs

It looks like it's sucking data right out of the book. Look for other videos of the process.


This came up from a quick search: https://diybookscanner.org/




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: