>>> If someone wants to host the raw files to allow others to download it let me know. It is a 83 GB tar.gz file which uncompressed is just over 1 TB in size.
Anyone knows if it is possible to download similar data set for youtube and reddit? I have ideas for search engine based on it, but I don't want to write/maintain scraper scripts.
Opensourcing a component means it no longer provides any competitive. It just helps companies increase their open source credentials or decreases their cost of development over the next few years!
The thing with Venice is that it is a datastore. It does not have any business-specific logic. By itself, the only competitive advantage Venice provides is that it is a datastore that has first-class support for ingesting batch data. We feel this is not something that needs to be kept as a closely-guarded secret. We have already discussed the architecture of Venice in conferences and articles.
As per decreasing the cost of development over the next few years, it is actually quite the opposite. In the near term, the cost of development increases as the system not-only needs to work in your organization, but also work externally - with external technologies.
You are right that reducing the cost of development is a goal, but there are many things that need to be done initially to facilitate that and make the project self-sustaining.
I think companies looking to decrease their development costs are going to be disappointed with open sourcing... That does not happen, by and large. Increasing creds is an aspect. Increasing the quality of the software is another one, IMHO.
Normally, they will look at two years of stable(typically W2) income (Salary + RSUs) from one company before they make decisions. However, in bay area, some mortgage lenders have started excusing people who jump companies, because, well jumping companies in bay area is common. They now have to tweak their models.
While we are here, is there a service that provides me with list of newly uploaded youtube videos? In my mother tongue (south indian language), I have many interesting channels but youtube does not surface them by default. I just want to go through the list and write an algorithm to filter them out based on mypreferences.
Is there somebody who scrapes them youtube and sells the data
Subscribing doesn't show all videos, but only those that youtube thinks you may be interested in. If I get overwhelmed with content I can unsub to channels I no longer watch. Instead some sit in my subs list for years with regular uploads and never showing up in my sub list.
That's what I proposed but there's a new designer that wants to use shiny tech. I'm just an advisor for a nonprofit, they ask me because I do it for free basically.
What is the best way to contact libraries/librarians for API access to their library?
I want to build an app which will dump me in random pages of my checkouted books, instead of me gathering energy to do it. I have other features planned.
I am stuck with how to interface with my local library, like Libby does. Any ideas ?
Libraries are just paying for the ability to offer Libby to patrons.
Libby then taps into a login API for the ILS a library uses to check if a patron's library card and PIN are tied to an eligible account, and Libby offers its own content based on whatever each library can afford to offer.
You would need to offer your own content, ultimately, with the Libby model.
Same goes for OverDrive, Hoopla and Kanopy.
Anyone knows if it is possible to download similar data set for youtube and reddit? I have ideas for search engine based on it, but I don't want to write/maintain scraper scripts.