Hacker Newsnew | past | comments | ask | show | jobs | submit | throwawaygog6's commentslogin

>>> If someone wants to host the raw files to allow others to download it let me know. It is a 83 GB tar.gz file which uncompressed is just over 1 TB in size.

Anyone knows if it is possible to download similar data set for youtube and reddit? I have ideas for search engine based on it, but I don't want to write/maintain scraper scripts.


There's a very large dataset of Reddit posts and comments at https://files.pushshift.io/reddit/


I have most of the historical reddit data except ~year I use to train ML models. Let me see if I can find a public link for you...


Opensourcing a component means it no longer provides any competitive. It just helps companies increase their open source credentials or decreases their cost of development over the next few years!


The thing with Venice is that it is a datastore. It does not have any business-specific logic. By itself, the only competitive advantage Venice provides is that it is a datastore that has first-class support for ingesting batch data. We feel this is not something that needs to be kept as a closely-guarded secret. We have already discussed the architecture of Venice in conferences and articles.

As per decreasing the cost of development over the next few years, it is actually quite the opposite. In the near term, the cost of development increases as the system not-only needs to work in your organization, but also work externally - with external technologies.

You are right that reducing the cost of development is a goal, but there are many things that need to be done initially to facilitate that and make the project self-sustaining.


I think companies looking to decrease their development costs are going to be disappointed with open sourcing... That does not happen, by and large. Increasing creds is an aspect. Increasing the quality of the software is another one, IMHO.


Normally, they will look at two years of stable(typically W2) income (Salary + RSUs) from one company before they make decisions. However, in bay area, some mortgage lenders have started excusing people who jump companies, because, well jumping companies in bay area is common. They now have to tweak their models.

Source: Recently went through the mortgage


Anyone knows how Dark shy was able to do hyperlocal weather prediction?


While we are here, is there a service that provides me with list of newly uploaded youtube videos? In my mother tongue (south indian language), I have many interesting channels but youtube does not surface them by default. I just want to go through the list and write an algorithm to filter them out based on mypreferences.

Is there somebody who scrapes them youtube and sells the data


Does the subscribe feature do what you are looking for?


Subscribe does not help me in discovering new channels on topics I want.


Subscribing doesn't show all videos, but only those that youtube thinks you may be interested in. If I get overwhelmed with content I can unsub to channels I no longer watch. Instead some sit in my subs list for years with regular uploads and never showing up in my sub list.


You can still use https://www.youtube.com/feeds/videos.xml?channel_id={CHANNEL... to pull up the RSS feed for a channel.


I use NewPipe on Android and it has an option to show a chronological list of all new videos from all channels I'm subscribed to.


Yep. Can confirm this. Weather apps provide location data. Email apps provide logged in users. SOurce: Ex-yahoo person here.


That's ten requests per second, with some caching I am assuming. How abou a digital ocean droplet for $5/month?


That's what I proposed but there's a new designer that wants to use shiny tech. I'm just an advisor for a nonprofit, they ask me because I do it for free basically.


Sounds interesting. I am working for something similar. Can you give some way to contact you.


Interesting. I will drop a mail soon.


What is the best way to contact libraries/librarians for API access to their library?

I want to build an app which will dump me in random pages of my checkouted books, instead of me gathering energy to do it. I have other features planned.

I am stuck with how to interface with my local library, like Libby does. Any ideas ?


Librarian here.

Libraries are just paying for the ability to offer Libby to patrons. Libby then taps into a login API for the ILS a library uses to check if a patron's library card and PIN are tied to an eligible account, and Libby offers its own content based on whatever each library can afford to offer.

You would need to offer your own content, ultimately, with the Libby model. Same goes for OverDrive, Hoopla and Kanopy.


Just call 'em up and ask around, or send an email, I suppose. There's gotta be somebody there who knows.

Granted, they're unlikely to just give you special access, but you could at least learn something.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: