Thanks! I was inspired by looking at unim.press and wondering, “how hard would it be to put an extract from the page there instead of lorem ipsum?” (Quite difficult, it turns out—I don’t think there’s any way short of human curation or maybe some sort of ML training on a large corpus—but you can get reasonably far with a carefully tuned set of CSS selectors for commonly seen class names.)
It took me about 8 or 10 hours yesterday to build, and I think about half of that was tweaking the heuristics for the paragraph/image extraction and keyword selection (the “STYLE, 3” that links to the thread page).
OP of Unim.press here-- yep, extracting text is pretty hard, though I think Instapaper and Pocket have good tech in that space. Certainly more than a weekend project's worth of work, which is why most of my focus personally was on the layout / visual fidelity. But it's cool to see that you put more time into extracting the content. I think for HN which is mostly longform online writing it definitely adds value.