Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Google Scholar explicitly made direct deals with publishers to scrape their content, with the constraint that while they can use the content to serve search results in Scholar, but cannot show the content of the papers on the site- just titles and short fragments that match. the deals were tenuous and I had to step carefully around my plan to use that database to implement large-scale scientific search over the literature (this was a long time before anybody was seriously considering using LLMs on research data).

I've spoken to several very wealthy/powerful people and tried to get them to negotiate a large-scale content license with the various publishers that would allow researchers and individuals to access more research in lower-friction ways. None of them (NIH, Schmidt, etc) were really interested.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: