> It's trivial to normalise the various formats, Ha. Ha. ha ha ha. As someone wh...

bawolff · 2025-06-04T08:06:42 1749024402

98% sounds good enough for the usecase suggested here.

pastage · 2025-06-04T10:59:10 1749034750

Writing good validators for data is hard. You can be 100% sure that there will be bad data in those 98%. From my own experience I thought I had 50% of the books converted correctly and then I found I still had junk data and gave up, it is not an impossible problem I just was not motivated to fix it on my own. Working with your own copies is fine, but when you try to share that you get into legal issues that I just do not feel are that interesting to solve.

Edit: my point is that I would like to share my work but that is hard to do in a legal way. That is the main reason I gave up.

landl0rd · 2025-06-04T13:25:31 1749043531

2% garbage, if some of that garbage falls out the right way, is more than enough to seriously degrade search result quality.

carlosjobim · 2025-06-04T13:44:09 1749044649

It's better than nothing, and nothing is what we currently have.