Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I began to notice some time ago that Google basically disregards my query, and fixates on the lowest common denominator. So, recently I was trying to search for a particular event or quote or something related to some famous person. But no matter how I worded the query, Google ignored everything but the person's name, and returned only fluffy flattering results about the person from popular magazine sites.

So I tried Bing, and the thing I was looking for was result #1. Like how it used to be with Google. So I switched to Bing.

Now after a few weeks of that, I see that Bing does the exact same thing much of the time. Totally different queries + same general subject = identical top ten results.

So. Anyone fancy creating the 2022 version of what Google was in 1998?



> I began to notice some time ago that Google basically disregards my query, and fixates on the lowest common denominator.

Which is funny. Google beat other search engines in the early 2000 because it actually did find what people where looking for, not what the "search engine" wanted people to see.

Now it's more and more the latter, I imagine because it's more lucrative for Google to display the results advertisers pay for...

That's really the product of the lack of competition in the search space. Nothing more. Why should Google bother? It would take billions in VC for a competitor to truly threaten Google dominance on search.

Same with Youtube. Youtube straight out doesn't care anymore about search terms and will just show some results Youtube "cooked up" for the user, unbelievable...


Consider also the possibility that they do want to deliver good results but their algorithms just passed their useful limit some time ago and people can game the system faster than they can improve the system, but there's too much money on the table to ever admit this in public.


This is the answer thats getting little attention. The SEO game didnt just give us hidden keywords, there are hundreds of millions of 'sites' and blogs for all these topics that are basically crap. Google can't tell the difference anymore. the majority of the internet is now mostly hidden (except during search) spam.


But I can tell the difference within a blink of an eye. There must be a way to train a ML model on „crappy seo“…


Surely they could just have a system to filter out bad sites?

For example, if I want information about some programming stuff, I only really want

* Wikipedia

* StackExchange

* MDN

* cpp-reference

* whatever official docs

There's absolutely no need to have

* geeksforgeeks

* w3schools

* cyberciti.biz

* random wordpress blogs

This takes all of 5 minutes to code, you could even have a userscript for it.


I have no problems with and frequently use w3schools. I also find tons of programming content on other sites where people post code they are working on or tips. I don't know that I've ever really referenced Wikipedia for anything programming. I normally just do a Stack Exchange site search through google. I feel like this is the only area where Google search is still decent, but anything after the first page, maybe second is like literally nonsense.


Google benefits from a chaotic and spam-filled ecosystem. At this point technology and network speed are at a point that anyone could re-code a Google if all they did was ignore 99.9% of the wesites out there that are crap and that Google (and Amazon and a few other spam-benefiters) had a hand in creating or promoting.

This is why Adwords has to go. It allowed Google to weaponize the collective man-power of the world to create and regurgitate new ad-space for Google to monetize. Ad-space that didn't exist, and doesn't need to exist.


uBlacklist exists


> it actually did find what people where looking for

I believe that is more a function of the web being less commercially relevant back then.

And every small and then massively growing online community goes through the same evolution:

While the community is small, commercial value is small enough, information is less tainted - and once it becomes big enough, the commercial value of that community becomes worthwhile to game.


The problem is that Google was only good in 1998 precisely because it was pre-Google. Now the web is SEO'd to hell and actively trying to prevent you from getting good search results.

A "new old Google" would only be good at searching the 1998 web, and if all you want is nostalgia, http://theoldnet.com is right there.


Exactly.

I don't think its only the google search algorithm to blame, this has a lot to do with the extinction of old school forums and blogs. These days large part of the real discussions and posts without financial motivation have moved to facebook, whatsapp groups, discord, slack and other places behind logins and paywalls which google can't index. What's left in public are mostly blogs and websites motivated by affiliate or ad revenue and SEO'ed to death. So there simply is much higher garbage to valuable content ratio in the public, indexable web.


95% of my Google searches are suffixed with "MDN" or "site:reddit.com".

If I'm looking for something particularly technical I'll search HN. That especially helps when I feel at my wit's end about some general concept like "sinuses" or "parenting". It's more common to get my mind blown by some offhanded revelation dropped by an HN commenter.


Same, I also often use site:reddit.com Thankfully Reddit is still left mostly indexable, but most of the other sites where discussions take place are not.


My experience is that Google continually improved up until around 2014 or so. For the last 3 or 4 years it has slowly been getting worse.


I spent the holidays with someone who does all their searches using only voice input. His eyesight is poor and he chooses to just say what he wants instead of putting on his reading glasses and typing it out. The types of things he was saying and the level of understanding his phone had if him wouldn't have been possible in 2014.


Ironically, I specifically remember the introduction of voice search coinciding with a marked drop in search quality. This had also happened earlier with their "instant results" experiment.


Google did good work fighting against SEO over optimization for a long time. Then they gave up and it all went to hell. I stopped using them a few years ago. I found their practice of dropping search terms infuriating. I switched to DuckDuckGo which is arguably lower quality but less infuriating.


Agree. The web material being searched is bad at the source. So there's little that a search engine can do to improve it. As the adage says - garbage in, garbage out.


The web was SEO'd to hell in '98 also, for other search engines. Google came along with a better algorithm for sussing out the signal if what content people found useful from the noise of attempts to trick the browser into increasing relevance signal.

It's not entirely clear what the next iteration of algorithm should be... SEO has gotten very good at its game.


> The web was SEO'd to hell in '98 also, for other search engines.

But mostly with invisible meta tags, not phrases repeated multiple times and texts written by content creators with no passion towards the subject.

Today's web posts remind me of ridiculous SEO-driven "effective product names" on multiple sites with low value products or fakes.

SEO aside, '98 web was passionate while today's web is written for robots, not humans.


So we need a new web.


We need competition. A new web would be a temporary benefit exactly because it would have competition, until it doesn’t.


The issue though is fracturing a new web likely wouldn't solve it, just introduce another fracture. Instead of the web being open and someone having their own information site content is split based on the creators preferred medium. Some people post on Reddit, some on yt, some on fb, Instagram etc. Each of the "major sites" has its own atmosphere where any subgroup can exist. In the early years each subject had its own community website, or a few of them often with overlapping members and links to each other. The purpose of the sites was strictly information and community around that subject. Their was normally a forum and then write ups / blog post, featured content and links to other sites.

I used to visit like 20 or 30 forums everyday and get great content. Then answering and getting help was a lot easier too. I had an RSS setup that pulled it all in and things were great. Somewhere around 2009 things just started to fall apart with each dominant social media or content site having their own sections for everything.

The barrier to entry for setting up a wide ranging community became far too low, along with that came the barrier to entry of bad information. I'm in several Facebook groups and subreddits concerning topics that interest me and the information is often crap. Also the same questions over and over. All the things people would complain about on the older forums is now the bulk of the content in FB groups, none of the good features are their, the format is terrible and every discussion eventually seems to turn into politics.


Gemini is nice.


> So. Anyone fancy creating the 2022 version of what Google was in 1998?

I've tried, not quite there yet, but it's got its moments.

https://search.marginalia.nu/


Thank you. Looks sane. I'll give it a try in the next days ( a quick search for gopher is better than i expected).


It's situationally very good in some topics. Perhaps not a replacement for Google, but at least a complement.


I suspect the change we are seeing is that Google is now using a neural network to (re-)interpret the search query.[1] Presumably that neural network also calculates some sort of neural hash for document/paragraph/sentence similarity. The upside is that it can correct more typos, intelligently drop irrelevant terms and understand the meaning of the query to some extent. But the downsite is less precision when you know exactly what you want. It sometimes even seems to ignore the quotes syntax for exact string matching. Very frustrating and very poor quality control.

[1] I bet they train these models based on unsuccessful queries as inputs (in which the user did not click any search result), and then the final search query after which the user left as desired output.


From personal experience the final search query is when the user realizes that none of the results contain useful information, or your actual keywords, and rage quits.

The user then asks someone in the office or goes and spends an hour at a university library.


I've noticed exactly this. One thing I do from time to time is try to find a song after just remembering partial lyrics. It used to work so well it felt like magic.

The last two weeks I've had two occurrences where the lyrics I remembered contained a common word which was also a brand name. It focused on that and completely ignored the rest of my query. So in both cases I added "lyrics" and now it ignored all but one word which happened to be the title of another song, no matter how I massaged the query.


I was attempting to find some historical data on the Soviet Ruble the other day, but found the task nearly impossible because Google "helpfully" considers "Russian" and "Soviet" to be synonyms, and so all the results were about the modern day Russian Ruble. I can't think of any other examples off the top of my head, but this isn't the first time I've run into this sort of problem.


That's very odd. I just typed "Soviet Ruble" into Google and every single result on the first page is about the Soviet, not modern Russian, currency[*]. This includes the box of images on the right-hand side, all of which depict USSR money.

Not really sure what to make of this.

[*] With the arguable exception of an Encyclopedia Britannica article on "ruble", which covers USSR, Soviet Union and Belarus currencies at the same time: ruble, the monetary unit of Russia (and the former Soviet Union) and Belarus (spelled rubel)...


I believe Google's algorithm will confound such experiments because it tailors search results according to user history. Which may mean that some portion of the HN crowd could see worse than average performance if they are more likely to clear cookies and/or search without a login.


I hate that they do this. Sometimes I just want neutral results. Nothing scoped to my location, to my search history or preferences I don’t even know of myself. Especially when researching a controversial topic, and try to get new data.


I think you're right: some sort of personalisation could well be contributing to the discrepancy.

P.S. Driven by curiosity, I repeated my experiment from an incognito window and it yielded the same results. This time I looked at the results beyond page one, and at least the first few pages of results are 100% relevant.


That’s probably not a bad substitution for the average person, but a bad one for a precise vocabulary.

I get the feeling though, that the need to compensate for mobile misspelling means a dumbing down of precise vocabulary.


> began to notice some time ago that Google basically disregards my query, and fixates on the lowest common denominator.

Same here. I was perplexed to see undisputed leader of search engines (which nobody managed to successfully rival, no matter how many billions they threw at the problem) decline in this way.

But now I wonder: is fixating on the lowest common denominator perhaps ultimately the more profitable approach?

Compare this to Amazon. A decade ago, buying on Amazon used to be a fantastic experience; no other retailer could rival it. Now the experience is utter crap, as are many of the products. But that crap still outsells everyone else by a large margin.

Perhaps we are seeing a general shift towards a focus on volume, rather than quality.


I think this is how giants fall. The various titans of times long gone - steel, chemical companies, mines etc. were one day so mighty that it was impossible to imagine them faltering. And then they stumbled, then tripped, then eventually fell.

Google, Facebook, Amazon will probably all eventually be replaced by a plucky, energetic and hungry competitor out of nowhere, just as they exploded in the faces of the predecessors.


Even lowest common denominator has gone too far. Querying "Barcelona" gives you 100% search results for the League team above-the-fold. You must search "Barcelona, Spain" to get information on the city, which then gets you direct links to Google Maps, etc.


I also get these results and I've never spectated a sports game in my life. These aren't just customized results for different people - if Google keeps a "completely uninterested" personal score I'd have the maximum value for sports


Yeah their entity resolution algorithms are really annoying. Half my searches come up with some random movie on IMDb


If I search for "Barcelona", I get result about the city on positions 6, 14, 21

If I search for "Barcelona city" do not get any results about the Soccer team.


Searching "Barcelona" I get the wikipedia for Barcelona city on position 2.


I remember around 2010-2012 it felt really great. You could learn the keyword-fu skill and with few keyword change / reorder iterations could explore topics and find obscure things on the internet. Now that method does not work at all. Always the same results and cannot find specific things. Around that time they started adding ML/AI and now searching with keywords is extremely unsatisfying.


I’ve noticed same. Google ”amusement parks italy”, get a list of world-famous parks (such as Central Park NY).


the searches results aren't the same, you are fingerprinted

its been this way for years now, at least in how it drastically alters search results for different users in the same country speaking the same language


For me every single result was quality articles or Wikipedia listing amusement parks in Italy.

I'm in Paris

I don't click on trash sites ever. Not sure what else might bias your results.

Try creating a new chrome profile and searching.


ddg gives me:

Amusement parks in Italy, top 5 fun parks you have to visit

20 of Italy's best amusement parks - TravelMag

THE 10 BEST Water & Amusement Parks in Italy - Tripadvisor

10 Best Theme Parks in Italy - Find the best Amusement ...

Family amusement in Italy: 15 excellent parks - Italy ...

Gardaland | The biggest amusement park in Italy

Amusement Park Emilia Romagna Italy | Mirabilandia

The 5 Best Theme Parks in Italy: Italy Logue

THE 10 BEST Water & Amusement Parks in Tuscany - Tripadvisor

Category:Amusement parks in Italy - Wikipedia

Not great, but seems reasonable. Google gives similar answers to be fair.


Huh. My first result is pretty darn helpful and relevant:

https://public.kulak.us/google-search.png


I get mostly the same as the first result.


Not quite as big of an offender, but when I was looking up "Nice Restaurants" with google maps fixed on my city, it instead started looking up Restaurants in Nice, France.


And as someone who lives in Nice, you can imagine how much of a pain it is to look for local venues or services when google thinks its an adjective.


I get extremely relevant results from Google for "amusement parks italy". I use a privacy proxy (VPN) and a browser in privacy mode. Perhaps Google only switches into guess-what-you-meant mode when it can link your search to their profile of you?


Current location bias?

There used to be a way to turn of location priority in advanced search.


I only got sites about amusement parks in italy.


Digital products could have a “finished” state, which is great for users, but bad for companies.

Dropbox could’ve been a finished product in 2012. Simple and focused personal storage solution. But it can’t justify the valuation of Dropbox, the company.

Same for Evernote.

However, could Google be a "finished" product at some point (e.g., 2000)? Probably not. When google was incorporated in 1998, they indexed only 25 million web pages. As the number of web pages grows exponentially, Google as a product needs to evolve, e.g., doing a better job to fight web spams / blackhat seos... The problem is that the web evolves way faster than Google could improve their search result relevance.


> So. Anyone fancy creating the 2022 version of what Google was in 1998?

I don’t know if it works well, but there’s Neeva [1]. It started as a search engine with a paid subscription model but then switched to a freemium pricing with a premium tier that will come "soon".

[1] https://neeva.com/


I almost feel as if they have like 100k actual pages that they present (and have looked through manually) and if it's not in that group they just show you the closest one (or say "no results").


I have no knowledge or evidence, but this has indeed become my mental model for whatever Google actually does these days. I do wonder what they actually do. All those engineers, for 20 years. Surely they haven't just been scaling up BackRub! But I find it very hard to believe that they're crawling the whole web. I find it very hard to believe that PageRank is still the same PageRank that we understood in 1998. And it looks like they're managing to editorialize quite heavily, even if they're doing it via algorithm somehow. But again, I can't really discern what they're doing anymore.

So for now, I have to disagree with the "garbage in, garbage out" theory. I don't believe Google has the same goal now that they did then.


> I find it very hard to believe that PageRank is still the same PageRank that we understood in 1998.

It's not, because as good as PageRank was, it was vulnerable to being exploited by link farms, which started popping up in its wake. I do remember that by the mid 2000s, about 5% of search results were pages just spamming search keywords and hyperlinks.


Do they say "no results" anymore? It seems like Google ignores parts of your query until you get results, no matter whether they are specifically connected to your search.


Yes, they do.

I frequently research historical topics on India, and I get no results or single digit number of results.


I have no inside information on this, but it's based on my usage of Google recently. I believe Google seem to be guessing at people's intended search query rather than performing the query based on the actual terms.

This is probably great for most people as Google's own data shows that most people do indeed search the same things at the same time, so guessing intention (especially with relation to current affairs and the queries of others) is probably a winning strategy for giving most people what they want - even if their search terms were a bit junk.

The down side is that the ability to hone results by tweaking or rearranging ones search terms goes largely ignored. Previously one could peel away layers of results with such meddling, now it seems there will usually be some word or name in the search query which Google will be affectionate towards, and the results are unmovable from that.


> Google seem to be guessing at people's intended search query rather than performing the query based on the actual terms.

I've read that google uses "Machine Learning" for their search results which I interpret to be exactly as you say they provide a stereotyped result based on what they think you are searching for (possibly optimized either for what is inevitably clicked on or for ad revenue), instead of actually matching terms.

What this means is that search results may be more accurate in some statistical way, like more people click on the top result, but it also pumps up the number of edge cases where it guesses wrong, while simultaneously making it impossible to tell where the results went wrong because you can't understand how they were generated (compared to eg keywors search where good or bad, the reason you got a result is obvious)


> Anyone fancy creating the 2022 version of what Google was in 1998?

I'm creating a faster search engine for coders, using good old literal search, with synonyms of programing operations in different languages, ie. array.push in javascript, array.append in python, array[] in php, and so on. The database is loaded in memory instead of huge analytics libraries, and searching is performed instantly. I see no need to protect my DB, since it contains basic snippets, and that allows these fasts queries. I move between several languages and needed a super quick reference without all the SO clones and spam.


Link to the Repo please?


> So. Anyone fancy creating the 2022 version of what Google was in 1998?

Well... it's not fair!!! In 1998, most of the content was published by enthousiasts for fun, the "business web" wasn't really existing yet, and Google was a minor player, so nobody was interested in "gaming" the search engine because there was not really some money to take

Fast forward: now, the web (and apps) is a major driver for ANY business, so there's a lot to win (or lose) to game the system. So the SEO and other user-hostile strategies are widespread (and no IA or algorithm will be good enough to change that)

Solutions that could help ? Maybe:

1) meta search engine: using cross verification and ranking with different search engine might help a little bit (more different algo to game for the SEO)

2) user-validated search: allow user to rank website AFTER they checked the page to exclude bad actors. It's gamable too (like comments and Amazon rank)

3) not-for-profit web: excluding business from search engine (but informations without sell is a kind of pre-sale however)

In any way, I don't see a lot of ways to decide an un-gamable system


We are trying - with no-tracking principles and practices https://blog.mojeek.com/2021/05/no-tracking-search-how-does-...


Blogspam and SEO folks are essentially adversaries to good ranking strategies on search engines. In 1998 there weren't many adversaries and it was mostly a technical issue. Now the game is much more complicated.


Would you be willing to pay a subscription fee to a search engine?


> Now after a few weeks of that, I see that Bing does the exact same thing much of the time. Totally different queries + same general subject = identical top ten results.

When I worked at Microsoft, the Bing team had an internal version employees could use where you could report if Bing's top results didn't equal Google's top results.

This was nearing a decade ago, so don't read much into it for what they do now.


This is awful when you have an error message where the text doesn't vary much, if at all, and you have a varying error code. You'll get results for all of the errors with similar error messages, but with different codes.


I've noticed this a lot as well recently. It's not as really the search results themselves that are bad but rather Google simply ignoring key words/phrases in my query.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: