Since I got a baby and we’re still adjusting to their schedule, I’m still working on the same project, Librario[1]. Librario is a simple book metadata aggregation API written in Go. It fetches information about books from multiple sources, merges everything intelligently, and then saves it all to a PostgreSQL database for future lookups.
You can think of it as a data source, or a knowledgeable companion that can provide comprehensive book information for online booksellers, libraries, book-related startups, bookworms, and more.
I got a pre-alpha build running for those that want to test it out[2], but the code is still not out there, as there are a few things I want to refactor. Wrote comprehensive documentation for it this weekend, now I need to refactor the merger package with some new rules, and write something to decrease the number of genres returned.
Been tough to find time to work on it because of the baby, but AI has been helping a lot to speed things up, and the work has been quite fun. Not sure if there will be interest in the idea, but it solves a problem I have, so I had to work on it anyway.
Hope to have the code on GitHub by the end of this week. AGPL licensed.
The recommendations are pretty good; even though I only input six books, it was enough for it to recommend books I have on my wish list. Definitely going to play around some more. Plus, the website is super fast, very impressive.
Any chance we could get an API going at some point? Are you planning to open source the work?
I'm interested in the scrapping of Goodreads too. I'm building a book metadata aggregation API and plan on building a scrapper for Goodreads, but I imagine using a data center IP address will be a problem very fast. Were you scrapping from your home network?
Thank you for the compliments :) I used 50-100 datacenter proxies. I just logged requests made by the iOS app with Charles and then recreated the headers to the best of my ability though the server did not seem to be very strict at all. Worth noting though that static residential proxies are not too expensive these days anyways.
Re the API: The model does actually run fairly well on CPU so it probably wouldn't be too expensive to serve. I guess if there is demand for it I could do it. I think most social book sites would probably like to own their recommendation system though.
Speaking of sustained scraping for AI services, I found a strange file on your site: https://book.sv/robots.txt. Would you be able to explain the intent behind it?
I didn't want an agent to get stuck on an infinite loop invoking endpoints that cost GPU resources. Those fears are probably unfounded, so if people really cared I could remove those. /similar is blocked by default because I don't want 500000 "similar books for" pages to pollute the search results for my website but I do not mind if people scrape those pages.
Never heard of “property-based testing” before. Coming from Go, I mostly use table and fuzzy tests.
Is this approach something that makes sense in Go as well? I see at least one package[1][2] for Go, but I wonder if the approach itself makes sense for the language.
Rapid is excellent. It also integrates with the standard library's fuzz testing, which is handy to persist a high-priority corpus of inputs that have caused bugs in the past.
Testing/quick is adequate for small things and doesn't introduce new dependencies, but it's also frozen. Many years ago, the Go team decided that PBT is complex enough that it shouldn't be in stdlib.
I went from Bash to Go as a system admin, and eventually built something with Rust too, so I can definitely confirm that the Go learning curve is softer than Rust.
With my wife and I using our OpenRouter account, via the API, we spend about $20 per month. Plus a Claude Pro subscription for Claude Code which I’m meaning to cancel.
We mostly use Claude Sonnet and Gemini 2.5 Pro, with a bit of GPT-5 when I need design work, and Claude Haiku in Home Assistant.
I prefer to use chat instead of agents… I use AI generated code as a starting point, mostly, and chat works better for that.
You’re definitely an outlier. Aside from companies, I don’t know anyone spending that much, to be honest.
While working on Shelvica, a personal library management service and reading tracker, I realized I needed a source of data for book information, and none of the solutions available provided all the data I needed. One might provide the series, the other might provide genres, and yet another might provide a cover with good dimensions, but none provided everything.
So I started working on Librario, an ISBN database that fetches information from several other services, such as Hardcover.app, Google Books, and ISBNDB, merges that information, and return something more complete than using them alone. It also saves that information in the database for future lookups.
You can see an example response here[1]. Pricing information for books is missing right now because I need to finish the extractor for those, genres need some work[2], and having a 5 months old baby make development a tad slow, but the service is almost ready for a preview.
The algorithm to decide what to merge is the hardest part, in my opinion, and very basic right now. It's based on a priority and score system for now, where different extractors have different priorities, and different fields have different scores. Eventually, I wanna try doing something with machine learning instead.
I'd also like to add book summaries to the data somehow, but I haven't figured out a way to do this legally yet. For books in the public domain I could feed the entire book to an LLM and ask them to write a spoiler-free summary of the book, but for other books, that'd land me in legal trouble.
Oh, and related books, and things of the sort. But I'd like to do that based on the information stored in the database itself instead of external sources, so it's something for the future.
Last time I posted about Shelvica some people showed interest in Librario instead, so I decided to make it something I can sell instead of just a service I use in Shelvica[3], hence why I'm focusing more on it these past two weeks.
[2]: In the example you'll see genres such as "English" and "Fiction In English", which is mostly noise. Also things like "Humor", "Humorous", and "Humorous Fiction" for the same book.
[3]: Which is nice, cause that way there are two possible sources of income for the project.
I’m working on an ISBN database that fetches information from several other services, such as Hardcover.app, Google Books, and ISBNDB, merges that information, and return something more complete than using them alone. It also saves that information in the database for future lookups.
Mostly because I’m working on a personal library management service called Shelvica to solve my own problems[1], and none of those services provided all the information on a book. One might provide the series, the other might provide genres, and yet another might provide a cover with good dimensions, but none provided everything, so I decided to work on something of my own (called Librario).
While Shelvica is the focus, Librario could become its own thing in time, so I don’t mind the sidetracking.
I also plan on having a “ISBN Search” kind of website that feeds from that database as a way to let users search for information about books, which then feeds the service’s database, making it stronger for Shelvica.
I open source everything I make, but I’m still wondering if these will be open sourced or not. I’ll probably go with the EUPL 1.2 license if I do decide on open sourcing them.
[1]: My wife and I have a personal library with around 1800 books, but most applications for management are either focused on ebooks or choke with this many books. Libib is the exception, but I wanted a little more.
Didn’t have the time yet, but it’s on my todo list. I have extractors for Google Books, Hardcover.app, and ISBNDB already working, and Amazon, Goodreads, and Anna’s Archive in the todo list.
I do plan on including a link to the book on Anna’s Archive in the “ISBN Search” website. At least to the search page with the filters already filled.
Hey I'd like to learn more about what you're doing. I'm working on a tangentially related service but focusing on audiobooks. One big stumbling block I ran into early on was trying to find something close to a unified ISBN datasource.
If you're up for it, shoot me an email at charles@geuis.com.
I attempted something like this because I wanted a good books search service which provided me at-a-glance information I needed from Storygraph & Goodreads. The main things I look for when I search a book is genres/Storygraph's "moods", number of pages, whether it's part of a series, rating across services & how much does it cost.
The algorithm to decide what to merge is the hardest part, in my opinion, and very basic right now. Eventually, I wanna try doing something with machine learning. Definitely a fun thing to work on, though.
Having a full time job and a baby to take care of make progress slow, but I should have the website ready soon. Shoot me an email and I can let you know when that happens.. email is on my profile.
Service isn't online yet, but you can find me on Sourcehut[1] and on GitHub[2], where I'll probably be posting the service soon. You can also email me[3] if you have a request or a specific use case I might be able to support.
You can think of it as a data source, or a knowledgeable companion that can provide comprehensive book information for online booksellers, libraries, book-related startups, bookworms, and more.
I got a pre-alpha build running for those that want to test it out[2], but the code is still not out there, as there are a few things I want to refactor. Wrote comprehensive documentation for it this weekend, now I need to refactor the merger package with some new rules, and write something to decrease the number of genres returned.
Been tough to find time to work on it because of the baby, but AI has been helping a lot to speed things up, and the work has been quite fun. Not sure if there will be interest in the idea, but it solves a problem I have, so I had to work on it anyway.
Hope to have the code on GitHub by the end of this week. AGPL licensed.
[1]: https://github.com/pagina394/librario
[2]: https://paste.sr.ht/~jamesponddotco/5612eaa80fc7eee8b6180a31...
reply