Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Saving a webpage to a PDF is literally one command line away:

chromium --headless --disable-gpu --print-to-pdf=google.pdf http://google.com/

What does Apify add in this case?



You can call the API from anywhere, from resource-constrained servers, Docker containers that cannot run headless Chrome, JavaScript on a website etc. Also, we'll keep adding new features to this act to make it worthwhile to use it, e.g. retries on failures, posting of the file to some URL etc.


This is exactly what I do bundled into a nice function added to my .zshrc

    chromepdf() {
        chrome --headless --disable-gpu --print-to-pdf="$1" $2
    }


Is there anyway to make chrome wait until the page loads?


Looks like it prints automatically once Page.loadEventFired is triggered.

Alternatively, you can run Chrome headless with the remote debugging API (--remote-debugging-port=9222) and send a Page.printToPDF (https://chromedevtools.github.io/devtools-protocol/tot/Page/...) after some delay.


What do you use this for?


Save a webpage as a PDF


Did not mean to sound negative, but what is the use-case of saving web pages as PDFs? I understand building in the functionality in something else, but here it sounds like you manually type/paste in URLs on a regular basis.

Edit, I see now that I replied to the wrong comment. It was meant to they who made an alias to it.


Also if you can't access this command for whatever reason, another option is to open the print dialog box in Chrome and set the destination to "Save as PDF" and it will work. You'll even get to see a preview. It's very useful for 1 off saves where you want to consume a really long post offline in a PDF viewer.


Sure, but the idea is to do it programatically.


Thanks, I didn't know that was a thing. I was checking out wkhtmltox earlier this week as well.

Looks like there are a lot of options available for this. I suspect Apify's using one of them.


Actually we're simply using Puppeteer - see the source code at the bottom of https://www.apify.com/jancurn/url-to-pdf


Any special reason to why you use this over Chrome?


Puppeteer is a scripting library for Chrome, built by the Chrome team.


There's also wkhtmltopdf that more or less does the same thing using webkit.


Are there any downsides to me using this to build my own archiver?


Many projects are already using this for archiving purposes. Check out https://github.com/pirate/bookmark-archiver

It's robust because the modern web is pretty much built for Chrome, although it can be resource-intensive if you're archiving many sites.


converting to pdf is lossy, i wouldn't use this for archiving purposes, normally i use webrecorder for archiving.


Is there a way to add a delay to let JS render?


Now there is - I've just added the "sleepMillis" input option.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: