Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Export HN saved links (upvotes) as JSON or CSV (github.com/amjd)
82 points by amjd on May 23, 2016 | hide | past | favorite | 16 comments



Putting in the request now for HN to update their API to provide this: https://github.com/HackerNews/API

I figure since a comment of mine sparked 'saved comments' being a thing that lightning could strike twice...but then I guess the API would need to allow authentication for private data.


> We hope to improve the API over time, and may later enable access to private per-user data using OAuth.

It's been like that for a long time I think.


I wonder if they'll take volunteers.


It needs a sleep(30) in the page request or it will get your ip banned.


Scraping HN is hard. I got an IP ban even when using sleep with a random duration.


Alas, I couldn't get this working. Never used Python before so I'm unsure how to fix it, but it seems like something breaks when the title is undefined.

Was excited to as well, was going to write a script that fed this into Pinboard. Maybe I'll code up my own upvoted item scraper later.

    Enter your HN account details:
    Username: firloop
    Password:
    Logging in...
    Logged in successfully.
    Error getting data for page 1
    Traceback (most recent call last):
      File "export_links.py", line 137, in <module>
        main()
      File "export_links.py", line 113, in main
        if len(tree_title) < 61:
    UnboundLocalError: local variable 'tree_title' referenced before assignment


Hi, I just fixed that error. Can you clone the latest code and try again? If you still have any problems, please create an issue on Github. :)


No dice (same error as commenter above).


can you do pip install cssselect and try?


This idea is cool!

But, error when grabbing links.

  $ python export_links.py
  Enter your HN account details:
  Username: 0xcmp
  Password:
  Logging in...
  Logged in successfully.
  Error getting data for page 1
  Error getting data for page 2
  Error getting data for page 3
  Error getting data for page 4
  [...snip]


Note: script requires you to enter in your HN authentication details. (Unavoidable, since the HN API neither supports authentication nor an endpoint for saved stories)


Well you would be running it on your machine anyway so guess it doesn't really matter.


It's ten times better than entering your credentials on another website, but the code could still do some fun stuff with it. I kinda doubt it, but just saying.


The code is there for you to see, what harm can it possibly do? If you're cautious of this, you should be cautious about entering your username/password into web browsers to sign into websites I think...


The codebase can easily get big enough to miss it transmitting your credentials if you don't spend hours looking at it. Even in a 100 line script you can overlook a well hidden request, especially if it's extra data included in an existing request.


On a different note, you don't need any of these (api or end point). You can send your credentials via http and do the page scraping




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: