Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's pretty much all bespoke.

I use external libraries for parsing HTML (JSoup) and robots.txt; but that's about it.



What was the starting site you fed to the crawler to follow the links from to build the index?


Just my (swedish) personal website. The first iteration of the search engine was probably mainly seeded by these links:

https://www.marginalia.nu/00-l%C3%A4nkar/

But I've since expanded my websites, so now I think these play a decent role in later iterations, although they are virtually all of them pages I've found eating my own dogfood:

https://memex.marginalia.nu/links/fragments-old-web.gmi

https://memex.marginalia.nu/links/bookmarks.gmi




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: