I'm working on pure.md[1], which lets your scripts, APIs, apps, agents, etc reli...

shoebham · 2025-03-31T06:07:36 1743401256

Love the recursion redirect at pure.md/pure.md

andrethegiant · 2025-03-31T16:58:39 1743440319

You found the easter egg!

WillAdams · 2025-03-31T13:04:29 1743426269

It seems to miss URLs?

At: https://willadams.gitbook.io/design-into-3d/2d-drawing the links for:

https://mathcs.clarku.edu/~djoyce/java/elements/elements.htm...

https://mathcs.clarku.edu/~djoyce/java/elements/bookI/bookI....

https://mathcs.clarku.edu/~djoyce/java/elements/bookI/defI1....

are rendered as:

_Elements_ _:_ _Book I_ _:_ _Definition 1_

Maybe detect when a page is on gitbook or some other site where there is .md source on github or some other site and grab the original instead?

andrethegiant · 2025-03-31T16:01:17 1743436877

By default, href values of <a> tags are removed, because they add significant token length without adding more context. Coming soon, you can specify a request header to set whether or not you want links removed from the response. Those underscores you mentioned are from the italics.

metadat · 2025-04-01T01:06:47 1743469607

Cool project!

Recently discussed, too: https://news.ycombinator.com/item?id=43462894 (10 comments)

wild_egg · 2025-03-31T00:10:20 1743379820

Thanks for sharing. I was planning on building something like this in April after hitting too many issues with Jina and Tavily but it looks like you've already done the hard work!

andrethegiant · 2025-03-31T01:44:29 1743385469

Thanks! Still a work in progress :-)

wanderingbit · 2025-03-31T02:43:47 1743389027

What a great idea, I will soon be a paying customer. This solves a problem of an app I'm using that I was hesitant to try to develop myself.

andrethegiant · 2025-03-31T03:33:16 1743391996

Much appreciated!

hardlyfun · 2025-03-31T04:17:55 1743394675

Very nice, how did you manage to bypass sites with cloudflare turnstile setup?

udev4096 · 2025-03-31T05:22:15 1743398535

Flaresolverr most probably

erekp · 2025-03-31T13:47:11 1743428831

how do you exactly fallback to common crawl? isn't the cost to even hold and query common crawl insane?

andrethegiant · 2025-03-31T15:57:28 1743436648

With AWS Athena, you can query the contents of someone else’s public S3 bucket. You pay per read, but if you craft your query the right way then it’s very inexpensive. Each query I run only scans about 1MB of data.

wfn · 2025-04-01T16:46:52 1743526012

Since I was just looking at this accidentally, here are some examples of how to query at a ~cent-per-query cost level (just examples but quite illustrative): https://commoncrawl.org/blog/index-to-warc-files-and-urls-in...

m0rde · 2025-03-31T10:20:12 1743416412

Is there an example we can see?

27theo · 2025-03-31T10:26:56 1743416816

https://pure.md/https://news.ycombinator.com/item?id=4353323...

sharpshadow · 2025-04-01T12:47:36 1743511656

Works great on mobile thanks, helpful tool to bypass flaky websites, js and even some paywalls.

udev4096 · 2025-03-31T05:24:44 1743398684

[flagged]

NationOfJoe · 2025-03-31T06:42:06 1743403326

i have no skin in the game and honestly i am wondering how this idea contributes to enshittifying the web more?

this idea just seems like it provides the same content as visiting the site in a different view, like reader mode?

hbsbsbsndk · 2025-03-31T12:53:18 1743425598

The service seems designed to bypass anti-scraping measures. If site owners don't want their content scraped by AI this is subverting their will.

It also obfuscates responsibility between the AI vendor and the scraping service. One can imagine unethical AI providers using a series of ephemeral "gateways" to access content while avoiding any legal or reputational harm.

elric · 2025-03-31T12:00:10 1743422410

I think the parent is referring to the goal of making the web more "LLM friendly".