Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
LibriVox: Free Public Domain Audiobooks (librivox.org)
286 points by drummer on June 18, 2020 | hide | past | favorite | 81 comments


Hey, so I'm the very part time volunteer sysadmin for LibriVox, can I use this opportunity to ask for help?

I have a day job and two kids, the amount of time and energy I give to LibriVox is sufficient to keep the lights on, but not much else.

We don't have money, in fact we don't even have a legal entity, any donations we get are handled by Internet Archive, who also kindly provide us our two servers (yes, only two).

If you want to help and know PHP and CodeIgniter, we'd be very happy to have you on board! While I am a developper, it's currently Python, and not at all web related. LibriVox's tech sack has fallen woefully out of date (PH 5.6, CodeIgniter 2), and I can't bring it up to date all by my self.

I'll be honest, it's not glamorous work. There's no automated testing, anything we change has to be tested and validated by the volunteers themselves - who are awesome, by the way. But we're a fantastic little nugget of Internet, and I think we should stick around on a solid tech stack for as long as possible ;)

All of our code is on GitHub: https://github.com/LibriVox/


While checking out the project on GitHub I noticed that you're using GerritHub for reviews and made me wonder whether this is holding back some people from contributing. Not that it's bad in anyway, but I assume it's something a lot of people are unfamiliar with and the description of how to use it for the project is very limited, plus work in progress isn't directly visible as PR on GitHub.

Additionally, you mention in the CONTRIBUTING.md that there's a Google doc with a lot of issues listed that should be converted to GitHub issues. Where does one find that Google doc?


I'm not married to Gerrit, and I concede that it could be a hurdle. It's what I use for my day job, so I chose it out of laziness and familiarity more than anything else. I'll probably drop it, and learn the GitHub PR tools and workflow - I've gotten a few random PRs already.

Edit: I'll talk to the volunteers about making that Google doc public, good suggestion.


I think one can go either way, but the important part is that it's clear to those willing to contribute. I think explaining a bit more how Gerrit reviews work and who is actually reviewing them, would already help, and maybe adding a link to the README, so it's easier discoverable.

Cool, I wouldn't mind copying/creating a bunch of issues on the side.


Done.

https://github.com/LibriVox/librivox-catalog/blob/master/CON...

Have fun, and many thanks!

(I'll try to make time this weekend to drop GerritHub as well, and convert what I currently have there to PRs)


Ah, I was actually browsing the volunteer page just now looking for a github link to contribute code or at least html/css.

I also am using python for work, not PHP. Does LibriVox plan to continue maintaining the current PHP codebase via a major version upgrade, or are you hoping to just do a rewrite in a more easily maintainable manner?


If someone's willing to put in the time to redo our code in Python, I wouldn't be opposed to that. However, keep in mind that it'd be a massive undertaking. Last time we re-wrote our apps, it took a full-time developer multiple months. Admittedly, everything was brand new, and included collecting requirements and validating the prototypes with the volunteers. So perhaps a "simple" rewrite, keeping all the logic the same, wouldn't be so hard.

Also keep in mind - a good chunk of the volunteers aren't tech-savvy, and are nervous when faced with changes. Just upgrading to a new phpBB version, with a new theme, was met with some grumbles.

I realize I'm a horrible salesman, but something about LibriVox just keeps me involved. I don't even participate in the audiobooks aspect of it, just tech. I was initially brought on as a paid sysadmin as part of that project to modernize our code that took that full-time developer months. When that ended, I stuck around as a volunteer.


If it's based on CodeIgniter it should be pretty maintainable already. The documentation of each version has detailed instructions how to upgrade the code to get up to speed with the new version up to CI 3. CI 4 is however a complete rewrite and needs a lot of work I guess but I haven't gotten that far yet. CI 4 requires latest version of PHP.


Even WordPress theme is little bit outdated...


My late father did a number of recordings for LibriVox.

In fact in the last conversation I had with him, he said to tell the people on the LibriVox forums "Cheerio" from him, so he had certainly enjoyed being part of that community.

After he died and I posted his farewell message there, it was quite emotional reading through the tributes to him and his skill as a storyteller. Later still I found a dozen or so "fan mail" letters - so he knew he was appreciated.

I'll always be grateful to LibriVox for giving him an avenue to exercise a talent that wouldn't otherwise have been public.


See this is why I love these sort of projects. Yes as a casual person just reading the nerd news it’s a useful resource

But then you go on to realise it’s a resource that people have poured themselves into. I love it


Your father's reading of of Oedipus Rex is superb.

Please accept my thanks.


I thank you on his behalf. Most of his fan mail was for his reading of The Prisoner of Zenda, so perhaps try that if you haven't already and the genre is to your taste?


There are also a huge number of audiobooks available on YouTube.

I made a Reddit bot [0] that continually scans comments across the entire site for mentions of science fiction and fantasy book titles. It uses a dataset of book title/author pairs that were scraped from the Speculative Fiction Database, a compendium of all known science fiction/fantasy literary works.

If an SFF book title/author pair is detected in a comment, the bot searches YouTube for an audiobook of the mentioned title, then replies with a link to it should one be found.

[0] https://www.reddit.com/user/EmotionalField


Cool stuff. Is this open sourced so one can modify it to search for specfic genres of books or specific authors? Cheer!


Sure. The code's up on my site [0].

Please note that I'm a beginner programmer, so the code probably looks really ugly to professional devs. I plan on refactoring the code once I get more advanced in programming.

[0] https://capybasilisk.com/posts/2020/04/speculative-fiction-b...


Don't hate on your code. If it runs and does what you want, then it's good code. Refactoring later can be useful and improve things like performance and readability but what matters is that it works.


Thank you! It was the first time posting my code here, so I was kinda nervous, haha.


Awesome. Didn't know about reddit API wrapper praw! Thanks a ton, will give it a try.

One question: How does the locally run python script feedback the scraped data to bot user account app?


The bot's currently running on a virtual private server. The scraped data is all in a CSV file located in the same directory as the bot's source file. When the bot is run, it logs on to Reddit, loads the CSV data file into memory, and begins iterating through the Reddit comment stream checking for book title/author pairs.


Good bot


That is seriously amazing. Thanks for sharing!


Woah! This is great! Thanks so much.


I created and maintain an audiobook iPhone app, Bound. I’ve been meaning to integrate LibriVox as a source but the API leaves a lot to be desired. Has anyone had experience with this? It seems like I should just use Archive.org as the source instead.

If anyone wants to check out my app: https://apps.apple.com/us/app/bound-audiobook-player/id10417...


One of the first 100% SwiftUI apps I made to practice it was to build a librivox client.

Download the librivox iPhone client and use mitmproxy to see the endpoints it hits.

Oh yeah, looks like it hits https://librivox.app 's api (not the .org). Don't remember if there was/wasn't a relationship between the services. For all I know it's a full 3rd party content mirror.


Oh, this is a completely different entity.

Seems a bit hostile to sweep up the branding of librivox like that.


Their about page says that Archive.org hosts their audio files for free.


I’ve been using Bound for a couple years and I absolutely love it.


Looks great - but the App Store says it requires iOS13. iPhone 6 user so I’m stuck on iOS12 - I couldn’t see from the release notes when that change came in?


Bound was the best $4 I’ve spent on an app, thanks Tim.


Came here to say the same thing. Awesome app. Great to be freed from the shackles of Audible and the likes :)


Wow! Thank you for the compliment. I really appreciate it.


As an android user, I'm so jelly. Those screenshots look delicious, well done.

I feel like amazon literally hasn't bothered updating their UI since the 90s. I love audible, but god damn their UI is depressing and actually makes listening to audiobooks a worse experience. There's no better alternative either :(


I can recommend Voice https://play.google.com/store/apps/details?id=de.ph1b.audiob... which is open source!


I haven't done much comparison, but Smart Audiobook Player has always served me well.


Your app looks very interesting, thanks for sharing.

Kind of off topic, but is it possible to sync your current time with a cloud provider using this app?


At this time no. Audiobooks and their current progress are local to the device.

This is a fairly complicated feature to implement and I haven’t had a chance to do it yet.


Understandable, thanks


It seems like there are a lot of LibriVox items available as podcasts on iTunes. Maybe there are similar feeds you can use.


Your app is awesome. I use it every day. I really like the smooth streamlined interface. Thanks for your effort.


With public domain ebooks in text based formats, if desired, it’s easy for a distributor or for the end user to change the display fonts (and font attributes) to make it easier or more pleasant to read. For audiobooks, these factors — the voice of the narrator, the intonations and expressions, accent and more — matter to have a good listening experience. Even commercially published audiobooks are dissed on the basis of having a poor narrator. An audiobook player doesn’t have a lot to provide on these factors (with the current technology) compared to ebooks.

With the above context, how is a dump of audiobooks like this useful to the masses without reviews from people who have listened to the works? The lack of reviews here is not a problem as much as the lack of a reviewing or commenting interface would be, IMO.

How do the current listeners (on HN) of works here decide what to download or not? Or is it just a matter of “download, listen and see if it’s good enough or move on”?


That's true, but I don't think it's a huge problem. Listening to an audiobook typically takes at least five hours. Usually you can tell whether you're happy with the narration within a couple of minutes.

While you say "a dump of audiobooks", it is certainly not that for the readers who put in a considerable amount of work for each and every title. And it is certainly not that for the listeners who can listen to books not otherwise available.

I don't think reading or listening to books at random (or based on average community score) makes sense. If you pick books based on trustworthy recommendations, most problems go away. Then just see where that book is available; if it is on Librivox and it's good enough, great. If not, get a commercial edition or a hard copy.

I've only listened to a single audiobook from Librivox. It was a book that has no commercial counterpart. The quality was mediocre but I was really happy to be able to listen to that specific book. On the other hand, I had to stop listening to another commercially produced book (that I also wanted to listen to) because I couldn't stand the narrator, so I do see your point of view.


The LibriVox app on iOS has reviews.

Otherwise, uh, it’s free. Take a listen, if it works, keep it. If not, drop it. Conundrum solved.


It has non-zero opportunity cost. The search for LibriVox seems to have no indication of the user rated quality of the recording and lists entries with no completed books.

I love audiobooks, but honestly don't see an easy way to interface with this site


> It has non-zero opportunity cost.

Then just don't use it at all. Go buy a sub to Audible.


Fuck audible. I get most of my audiobooks from the public library


All valid points, and I did spend quite a bit of time finding the narrators I like.

If you want a place to start, I can recommend the recordings by Mark Nelson - https://librivox.org/reader/251

He recorded a lot of pulp classics - "The Cosmic Computer", "First Lensman", "Hour of the Dragon", "Princess of Mars", "Space Viking". Professional quality recording, great voice, charming books.


I knew I was listening to a lot of audiobooks when I started having strong opinions about favoured and disfavoured narrators. NYT recently profiled Eduoardo Ballerini as a kind of "voice of God", but, for me, that's John Man. What Don LaFontaine is to movie trailers, John Man is for audiobooks, at least to me.


> Or is it just a matter of “download, listen and see if it’s good enough or move on”?

Pretty much.

It can be pretty annoying when a book has multiple narrators as the quality can vary significantly, so you might get deep into a book before you start getting consistently bad narrators.

I can't remember the book now, but it put me off using LibriVox forever.

Now I try the local library app or the app from the local library where I used to live and if that fails there is always audible.


> It can be pretty annoying when a book has multiple narrators as the quality can vary significantly, so you might get deep into a book before you start getting consistently bad narrators.

I think one of the most annoying experiences I had was when they changed narrators in the third book of the Takeshi Kovacs series. It wasn't even that the third narrator was bad, especially, just that he sounded wrong—but I was far too invested to set it aside at that point.


> How do the current listeners (on HN) of works here decide what to download or not? Or is it just a matter of “download, listen and see if it’s good enough or move on”?

Pretty much download, listen, move on. But after a few, you might start recognising the reader, and have confidence in them. (It's been a while for me since I made heavy use of Librivox, so I've forgotten the ones I preferred.)


There are reviews in the Android app, which is what I usually listen to.

Narration is a personal preference, but I've found many of the narrations are pretty good. Even compared to paid ones.

I work in a laboratory where I can go hours without the ability to even touch my phone to turn something on or off, or change volumes. So far, I've only been terminally annoyed to the point I turned it off, despite wearing acid contaminated gloves. For me, that's a pretty good track record.


Why are commercial audiobooks automatically higher quality while LibreVox is "a dump" of audiobooks? For what's worth I listened to about half a dozen audiobooks off LibreVox and the quality has ranged from good to superb.


LibriVox recordings are Public Domain in the USA. If you are not in the USA, please verify the copyright status of these works in your own country before downloading, otherwise you may be violating copyright laws.

Is there any easy-access repository that goes over the copyright patchwork for different works around the world? I yearn for the day when we have a more global schema for copyright/distribution rights.

As someone who travels frequently - it does my head in when a subscription content provider geo-blocks something I'm accessing.


Copyright is ridiculously hard. Beyond difficult.

Even if you have general guidance across the varying nations, most of the time you can't expect it to be compatible. In fact, I've run across differing and incompatible copyrights across states within the same nation. I've seen copyright laws that contradict themselves, and the courts reinterpreting that law in seemingly random ways, so understanding the law isn't a guarantee that you can obey it.

As things currently stand even determining who holds a copyright may not always be possible, and the person who holds the copyright may not be aware that they do.

Even just the concept of Public Domain does not exist in all countries, and does not mean the same thing across most countries - there are subtle differences that can lead to insane incompatibilities.

Distribution rights, whilst complicated, are far less complex than the tangled web that is copyright law.


Thanks - this more or less echos what i suspected but never really delved into. It is a touch disheartening - and I don't see who would be incentivised (or even capable of) clarifying/simplifying things.


It’s really problematic. For a specific example, let’s look at Barrie’s Peter and Wendy. It was published in 1911, so is well below the current 1925 barrier in the US. In most countries he falls into the 70-years-after-death PD boundary, and in Mexico with their 90-years-after-death it’s fast approaching the PD date. But in the UK there’s a specific exception built into law that it’ll never go into the public domain: https://www.legislation.gov.uk/ukpga/1988/48/section/301


(I misremembered, it’s actually 100 years in Mexico: https://en.wikipedia.org/wiki/List_of_countries%27_copyright... )


I agree to put anything I write myself as public domain, so if it isn't a secret, then you can copy it if you want to. I don't like copyright. I don't know if you want to make audio recordings of the stuff I have written, but at least now you know, you are free to do so if you want to do.

(If so, I would request that you make the recordings public domain and Opus format, although this is not required.)


Highly recommend David Clarke's work, especially his Sherlock Holmes readings. For me, he's the best Sherlock narrator I've heard.

He also runs a chain of Houston coffee shops; their mugs sit proudly in my cabinet! Felt great to compensate someone for many many hours of public good.


You compensated the owner of a chain of coffee shops by buying a few mugs?


There is a similar project for german audiobooks: https://www.vorleser.net/


Because of the public domain licensing, some LibriVox recordings are simply copied (with the LibriVox ID removed) and sold on Audible. (Search Audible for Karen Savage - e.g. Pride and Prejudice).


I prefer text to speech than LibriVox I listened to a few books, the only audio books I can listen with a narrator is when he is really professional otherwise TTS.


Which is your preferred TTS engine or product for this purpose?


I have a few years old phone which still has ivona on it, it's just perfect!! have a look at a sample - http://www.ttsforaccessibility.com/ I use british english, female voice, Amy - perfect.


Librivox rocks dude. I volunteered and must await verification :D


While I prefer text based formats myself, there can be the use for audio recordings of it, I think, such as people who prefer the audio recordings, or when it can be useful in different circumstances.

But, some things I don't know. How to deal with footnotes? What if some of the footnotes are unreferenced (like in a story I wrote)? What if the text has some word/name that they do not know how to pronounce? (What if the author also doesn't know, or if the author knew but is now dead or for some other reason can no longer speak (and didn't write down the pronounciation)?) What about foreign text? (In the story I mentioned, it is mostly in English, although there is one passage written in Latin.)


With regard to pronunciation, this is often what separates a quality reader from a poor one.


I found a bug. I'm on Firefox on Windows and when I open genres in new tabs, they dont open properly. The new tab just shows the top of the genres list, even though the url is full of information regarding which genre I clicked.


If you enjoy Mark Twain, the John Greenman readings (https://librivox.org/reader/107) are fantastic.


I love librivox. Perfect for drawing too. I find the phone app easier to use then the website though. Perhaps I might help out when I get some freetime later.


Could audiobooks be used to train neural networks?


The audio files from LibriVox are commonly used to train neural networks for ASR and TTS, see e.g. https://www.danielpovey.com/files/2015_icassp_librispeech.pd... and https://arxiv.org/abs/1904.02882


A bit off topic but what’s the status on any research toward leveraging audiobooks to produce text-to-speech? What are the hurdles in making it seamless and imperceptible? And what’s the “theoretical” number of recorded words/hour required to produce something reasonable?


I find their categorization amusing as this is there:

Non-fiction > Bibles > American Standard Version


Probably, a satisfactory justification can be either or both of:

1) historians/theologians reviewing a literal translation 2) believers of the faith

That said, I agree that the rest of us thinks it's a story. Maybe magical realism would be close enough?


FWIW, the Dewey Decimal system also classifies myths and religious texts separate from works of fiction. And the two are classified separately.


I suppose the Dewey Decimal System is a suitable reason to classify it as "non-fiction", but then presumably the Dewey Decimal numbers should be present (even if that is not the classification scheme of LibriVox, which is OK, I suppose).

(Anyways, the Bible is a collection of texts of various kinds. It isn't really a work of fiction in the way that most stories are, and even the texts they have are not entirely fictitious anyways; some parts might be historically accurate, some parts might be exaggerated, some parts might be lost, and some parts might be works of fiction, for example.)


The problem is that if you’re categorizing by “what fraction of people think this is true,” you have to group the Bible with Darwin and antivaxxers. You can’t decree “we’re organizing by truthfulness” unless you’re okay with other people organizing with what they think is true.

On the other hand, if you group it by topic covered and style, the Bible gets sorted with history, which isn’t great but at least is reasonable. And then there’s also the category of religion that fits even better.


Yes, the category of religion would fit better. Perhaps a categorization like: Religion > Religious texts > Bible > American Standard Version

However, some books might fit into multiple categorizations, depending on the type of categorization. (As one example, one type of categorization is Dewey Decimal System, so if you select the Dewey category then the subcategories will be the numeric categories, but the same book might also be found in other categories based on a different categorization scheme.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: