Saving a webpage to a PDF is literally one command line away: chromium --headles...

jancurn · on Nov 2, 2017

You can call the API from anywhere, from resource-constrained servers, Docker containers that cannot run headless Chrome, JavaScript on a website etc. Also, we'll keep adding new features to this act to make it worthwhile to use it, e.g. retries on failures, posting of the file to some URL etc.

xilni · on Nov 2, 2017

This is exactly what I do bundled into a nice function added to my .zshrc

    chromepdf() {
        chrome --headless --disable-gpu --print-to-pdf="$1" $2
    }

sgolestane · on Nov 3, 2017

Is there anyway to make chrome wait until the page loads?

lashkari · on Nov 6, 2017

Looks like it prints automatically once Page.loadEventFired is triggered.

Alternatively, you can run Chrome headless with the remote debugging API (--remote-debugging-port=9222) and send a Page.printToPDF (https://chromedevtools.github.io/devtools-protocol/tot/Page/...) after some delay.

zulln · on Nov 2, 2017

What do you use this for?

nsomaru · on Nov 3, 2017

Save a webpage as a PDF

zulln · on Nov 4, 2017

Did not mean to sound negative, but what is the use-case of saving web pages as PDFs? I understand building in the functionality in something else, but here it sounds like you manually type/paste in URLs on a regular basis.

Edit, I see now that I replied to the wrong comment. It was meant to they who made an alias to it.

nickjj · on Nov 2, 2017

Also if you can't access this command for whatever reason, another option is to open the print dialog box in Chrome and set the destination to "Save as PDF" and it will work. You'll even get to see a preview. It's very useful for 1 off saves where you want to consume a really long post offline in a PDF viewer.

Scarbutt · on Nov 2, 2017

Sure, but the idea is to do it programatically.

scaryclam · on Nov 2, 2017

Thanks, I didn't know that was a thing. I was checking out wkhtmltox earlier this week as well.

Looks like there are a lot of options available for this. I suspect Apify's using one of them.

jancurn · on Nov 2, 2017

Actually we're simply using Puppeteer - see the source code at the bottom of https://www.apify.com/jancurn/url-to-pdf

zulln · on Nov 2, 2017

Any special reason to why you use this over Chrome?

nikisweeting · on Nov 3, 2017

Puppeteer is a scripting library for Chrome, built by the Chrome team.

Natsu · on Nov 3, 2017

There's also wkhtmltopdf that more or less does the same thing using webkit.

granda · on Nov 2, 2017

Are there any downsides to me using this to build my own archiver?

nikisweeting · on Nov 3, 2017

Many projects are already using this for archiving purposes. Check out https://github.com/pirate/bookmark-archiver

It's robust because the modern web is pretty much built for Chrome, although it can be resource-intensive if you're archiving many sites.

xfer · on Nov 3, 2017

converting to pdf is lossy, i wouldn't use this for archiving purposes, normally i use webrecorder for archiving.

Keats · on Nov 2, 2017

Is there a way to add a delay to let JS render?

jancurn · on Nov 2, 2017

Now there is - I've just added the "sleepMillis" input option.