Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I once ended up on Encyclopedia Dramatica before I knew what it was. It was some semi-informational article about some person and then I scrolled down and there was an absolutely disgusting picture I'm not even going to describe here. I, for one, don't mind that it's not on the top of my search results.

The entire job of a search engine is to curate results from a vast internet and boil it down to what is probably interesting for you. You also don't want all kind of "shock sites" on top if you search for "gun homicide" or "ISIS beheadings". Most likely you want some background information on these things, not "shock sites". That's the service Google and such provide.



ED is an incredible repository of useful factual data dressed as a trolling/shock site. Google is absolutely not de-listing it because they're worried about you. There's much worse content available on google; the difference that the other stuff isn't politically sensitive.


ED is an amazing time capsule of the 4chan and SA adjacent cultures that sprouted up in the early 2000s. Sad to see the state it's in now.


But don't you think a query containing the exact name of a website is fairly likely in search of content from that site? In terms of recognizing user intent, I'd consider the need to use a search operator here a failing.

I commented because learning of such special-casing of a site by Google [a] was quite memorable for me at the time; since I also see plenty of other threads around us debating search engine curation, what follows is

I came across some Reddit threads from a certain time in 2014 which noted that the top Google results for some subjects were ED pages that mentioned them. Perhaps ED's downranking was in response to that. Whatever the cause, this is actually a more severe downranking than that famously applied to thepiratebay.org [b]; for example, the query thepiratebay org foo returns only results from the actual site, [c] but the same query for ED returns no results from ED. [d]

I did find two more cases where Google does return a (single) page from ED. The sole query without any operators that does so is encyclopediadramatica online (and its punctuation/whitespace-equivalent variants), for which the Main Page is the first result and the only ED result. [e]

The other case is when searching for a phrase within quotes that occurs nowhere else in Google's index other than on ED. For example, the expected only result of the query "Encyclopedia Dramatica help pages" is the ED page containing that text. [f]

So, to be more precise, ED is severely downranked rather than delisted, albeit with the result that no one can find it on Google without already knowing its URL (or a hapax legomenon quoted from its pages).

-----

[a][b][c][d][e][f] If I may use a simplistic model of Google that determines a score for each potential search result (i.e., 'distinct' crawled URL) by starting with the same initial score for all results and applying a sequence of steps to each result that each change its score, the following is a speculative explanation for all these behaviours:

- Following the meat of the algorithm is the downrank-specific-sites step, which decreases the score of each page of a downranked site by some amount (affects both TPB and ED: TPB scores are decreased moderately to implement [b]; ED scores are decreased massively, explaining [a]).

- Then comes the query-contains-site-URL step, which greatly increases the scores of all pages of that site (TPB results are now higher than non-TPB results, explaining [c]; ED scores were decreased so much that they are still lower than all other results, explaining [d]).

- Then comes the query-is-site-URL-exactly step, which makes the score for that site's base URL (which in the case of ED redirects to the Main Page) higher than any other score (explaining [e]).

- Search operators are last, and thus the highest-priority (explaining [f] and the site: operator).

These steps have held up in general where applicable for all the queries that I've tried.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: