same here. I mostly deal with text analytics, while the text PDFs do not create much issues, unless a crazy font is used, and the 2 column pages are a nightmare.
In case you are looking for an API to extract structure rich content like tables from PDFs or images, look into this https://extracttable.com (p.s. I contributed to it)
I'm a cricket sport fan, used watch most of the competitive matches live, unless I'm in office, which restricts streaming sites also the fear of being easily caught because of bandwidth consumption from the system. So, I follow text commentary sites like cricbuzz, cricinfo in the office hours, but the browser tab switch is inevitable when you want to follow a live feed or a sports text commentary. This browser extension will read the latest update (feed) for you so that you continue the work.
The design is slick, I like it. I run https://notyce.me which I believe might complement your app or vice-versa. If you feel the collaboration can make things better, can you drop me an email mentioned on my profile. Thanks
Shameless plug: Would you like to join the club of happy customers at https://extracttable.com - API to extract tabular data from images and PDFs without worrying about co-ordinates.
Suggestion: use a simpler synonym for idempotent on your high level pricing overview. Something like "automatically makes sure you aren't charged for duplicates"
That looks like a great comprehensive tool kit for data extraction. I understand the bundle is licensed under Apache, I'm curious to check on the needs/rules-to-follow to include a service like Abbyy.
We, extracttable.com - extract tabular data from images and PDFs over API, are interested to contribute and integrate the service into the bundle.
Managed to allocate 4 hrs/week for a sport
After 4 years of trail and errors finally launched a product that is generating revenue
Next year Goals
continue the 4 hrs/week for a sport to sweat
join a startup by first quarter
restrict to 10-15 hrs/week for the current side product
work on another side project