Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

98% sounds good enough for the usecase suggested here.


Writing good validators for data is hard. You can be 100% sure that there will be bad data in those 98%. From my own experience I thought I had 50% of the books converted correctly and then I found I still had junk data and gave up, it is not an impossible problem I just was not motivated to fix it on my own. Working with your own copies is fine, but when you try to share that you get into legal issues that I just do not feel are that interesting to solve.

Edit: my point is that I would like to share my work but that is hard to do in a legal way. That is the main reason I gave up.


2% garbage, if some of that garbage falls out the right way, is more than enough to seriously degrade search result quality.


It's better than nothing, and nothing is what we currently have.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: