throwawaygog6's comments

throwawaygog6 · on Oct 3, 2022

>>> If someone wants to host the raw files to allow others to download it let me know. It is a 83 GB tar.gz file which uncompressed is just over 1 TB in size.

Anyone knows if it is possible to download similar data set for youtube and reddit? I have ideas for search engine based on it, but I don't want to write/maintain scraper scripts.

teraflop · on Oct 3, 2022

There's a very large dataset of Reddit posts and comments at https://files.pushshift.io/reddit/

testesttest4 · on Oct 3, 2022

I have most of the historical reddit data except ~year I use to train ML models. Let me see if I can find a public link for you...

throwawaygog6 · on Sept 27, 2022

Opensourcing a component means it no longer provides any competitive. It just helps companies increase their open source credentials or decreases their cost of development over the next few years!

nisargthakkar · on Sept 27, 2022

The thing with Venice is that it is a datastore. It does not have any business-specific logic. By itself, the only competitive advantage Venice provides is that it is a datastore that has first-class support for ingesting batch data. We feel this is not something that needs to be kept as a closely-guarded secret. We have already discussed the architecture of Venice in conferences and articles.

As per decreasing the cost of development over the next few years, it is actually quite the opposite. In the near term, the cost of development increases as the system not-only needs to work in your organization, but also work externally - with external technologies.

You are right that reducing the cost of development is a goal, but there are many things that need to be done initially to facilitate that and make the project self-sustaining.

felixgv · on Sept 27, 2022

I think companies looking to decrease their development costs are going to be disappointed with open sourcing... That does not happen, by and large. Increasing creds is an aspect. Increasing the quality of the software is another one, IMHO.

throwawaygog6 · on Sept 26, 2022

Normally, they will look at two years of stable(typically W2) income (Salary + RSUs) from one company before they make decisions. However, in bay area, some mortgage lenders have started excusing people who jump companies, because, well jumping companies in bay area is common. They now have to tweak their models.

Source: Recently went through the mortgage

throwawaygog6 · on Sept 13, 2022

Anyone knows how Dark shy was able to do hyperlocal weather prediction?

throwawaygog6 · on July 5, 2022

While we are here, is there a service that provides me with list of newly uploaded youtube videos? In my mother tongue (south indian language), I have many interesting channels but youtube does not surface them by default. I just want to go through the list and write an algorithm to filter them out based on mypreferences.

Is there somebody who scrapes them youtube and sells the data

ttymck · on July 5, 2022

Does the subscribe feature do what you are looking for?

throwawaygog6 · on July 5, 2022

Subscribe does not help me in discovering new channels on topics I want.

Akronymus · on July 6, 2022

Subscribing doesn't show all videos, but only those that youtube thinks you may be interested in. If I get overwhelmed with content I can unsub to channels I no longer watch. Instead some sit in my subs list for years with regular uploads and never showing up in my sub list.

kmfrk · on July 5, 2022

You can still use https://www.youtube.com/feeds/videos.xml?channel_id={CHANNEL... to pull up the RSS feed for a channel.

gxqoz · on July 5, 2022

I use NewPipe on Android and it has an option to show a chronological list of all new videos from all channels I'm subscribed to.

throwawaygog6 · on July 3, 2022

Yep. Can confirm this. Weather apps provide location data. Email apps provide logged in users. SOurce: Ex-yahoo person here.

throwawaygog6 · on July 2, 2022

That's ten requests per second, with some caching I am assuming. How abou a digital ocean droplet for $5/month?

is_true · on July 2, 2022

That's what I proposed but there's a new designer that wants to use shiny tech. I'm just an advisor for a nonprofit, they ask me because I do it for free basically.

throwawaygog6 · on July 2, 2022

Sounds interesting. I am working for something similar. Can you give some way to contact you.

throwawaygog6 · on July 2, 2022

Interesting. I will drop a mail soon.

throwawaygog6 · on June 28, 2022

What is the best way to contact libraries/librarians for API access to their library?

I want to build an app which will dump me in random pages of my checkouted books, instead of me gathering energy to do it. I have other features planned.

I am stuck with how to interface with my local library, like Libby does. Any ideas ?

thebitstick · on June 30, 2022

Librarian here.

Libraries are just paying for the ability to offer Libby to patrons. Libby then taps into a login API for the ILS a library uses to check if a patron's library card and PIN are tied to an eligible account, and Libby offers its own content based on whatever each library can afford to offer.

You would need to offer your own content, ultimately, with the Libby model. Same goes for OverDrive, Hoopla and Kanopy.

shannifin · on June 28, 2022

Just call 'em up and ask around, or send an email, I suppose. There's gotta be somebody there who knows.

Granted, they're unlikely to just give you special access, but you could at least learn something.