IIRC they literally installed a Bing toolbar and agreed to turn on a feature that indexes all the sites that the user visits. To then blame Bing for indexing their decoy page, when Google themselves gave permission to index it, is absolutely laughable.
This is 7 years old though, I would be curious to see if the experiment is still repeatable today.
I've noticed personally that bing has gotten a lot better, but that could just be because it's still a delayed version of Google.
Microsoft may even be using IE to harvest data to reverse engineer a Google search algorithm. Even if they make an inferior verision thats 90% as good, that's still good enough to keep them relevant in the market.
I actually worked at Bing while all this was going down. It's a long time ago so I don't remember all the details, but when this surfaced it was a big deal internally.
In fairness to Microsoft/Bing, the data collected as part of the Suggested Sites feature (and others in IE) was pretty clearly stated as being used to anonymously improve other Microsoft products, and so it became one of many sources of data for Bing's index. If you have a ton of data about user's traffic patterns on the web, you'd be crazy to not use it in an aggregate way to infer information about which URLs satisfy a user's intent at any given time. The irony is that nothing was even specifically targeted at using data from Google, the machine learning algorithms in question just naturally learned that people were typically satisfied by sites visited after searching for a term on Google, and therefore boosted those sites by some amount in Bing search results. No human was in the loop deciding that. In the contrived cases used by Google engineers in the blog post, the Google results were simply the ONLY sites being returned by the index for the given search term, as no other parts of the index found anything.
After the issue was discovered, there was a huge push by the search relevance team to remove that data source. IIRC a new version of the index launched shortly afterwards that was stripped of all the user data features collected via IE, but maintained relevance parity through some amazing work by the team. I'd be extremely surprised if the experiment was repeatable today - I think they're much more careful about these things now.
I don't think there's any intentional "reverse engineering" going on, the Bing team has a ton of engineers working on novel relevance techniques. At the time that I left, we were actually beating Google on our internal relevance metrics (and developing tougher new metrics in order to be able to keep measuring progress). EDIT: For evidence of this, one only needs to look at the volume of research that Microsoft publishes each year for SIGIR: https://www.microsoft.com/en-us/research/event/microsoft-sig...
Well, you could sort of do this experiment yourself, with a bit of preparation and a decent bit of patience.
- Prepare a new small website on an extremely obscure topic; something a search engine will consider uninteresting and not worth ranking highly, but just legitimate enough to warrant maintaining in the search index. On top of obscurity, don't link to the site from anywhere. This is so the site doesn't wind up crawled by Bing.
- Manually add the site to Google's index (I forget where but there's a webmaster tool that can be used to manually add things; then you wait and see if the crawler paid any attention and added it to the index)
- Add a single webpage on the site that, in amongst all the other words, contains a single random word, or a series of random words. Now repeatedly search Google for the term, on a machine nowhere near a Windows system, and and wait see if your site shows up. It will take some time to get this working, because unlike Google's fully contrived test, actually getting garbage/nonsense words into the index does take effort, as (in my experience) the index seems to be primed to index/prefer valid words over garbage; on top of this, your nonsense word will be (if you're doing it right) the only result in the world for the word, so it may be the search result is in the index but has some attribute marking it below a "show in results" threshold, if such metrics exist.
- Now you have your magic term, do what Google did, and repeatedly search Bing for the term, from IE.
In the actual experiment Google Engineers manually inserted the data into their index. IDK if my own activities would result in my fake site getting indexed by them, also I don't know how to ensure that I am the ONLY result, which was the case with the google test.
Once the test site was successfully added to Google's index and the single webpage, and the nonsense term returned results for the test webpage, a toolbar-to-Google search from IE would just send the keyword off to Google, the search result would come back (from Google), and...
Or I'm completely missing something. This is possible.
With the massive size of their infrastructure, I'm guessing it would be very difficult to conclusively identify a single query, let alone attribute it to Bing.
Also, the process you describe isn't conclusive, since it's possible that someone else randomly searched for "hiybbprqag".
Finally, it's a stronger argument the way it is. Otherwise, it's just Google's word against MS. With their method, they could show that they could predict the search results by sealing a prediction before running the search on Bing. It's independently verifiable.
No, it's not that Bing was redirecting traffic through Google. Instead it was that windows/IE8 was sending data on search terms/clicks on google from their users' machines to Microsoft.
So user searches Google for a search term and then clicks on a result. This data is then sent to Microsoft, and Bing is updated accordingly (and will return that result for that search term) at a later point in time.
They're weren't claiming that when you searched bing, they did a google search immediately in the background. Rather, they claim that they were collecting google's ranking and using it in their own ranking algorithms. They wouldn't have ever have collected data on hiybbprqag without their previous actions.
No, that was never how it worked, or Google would have sued and probably won.
What actually happened was that, if a user had Bing's search bar installed in IE, the fine print for the toolbar installation declared that the keywords and links that they click on from other search engines could be sent to Microsoft's servers.
From the article:
>We gave 20 of our engineers laptops with a fresh install of Microsoft Windows running Internet Explorer 8 with Bing Toolbar installed. As part of the install process, we opted in to the “Suggested Sites” feature of IE8, and we accepted the default options for the Bing Toolbar.
So, the toolbar would send keyword, url, ie. {"alsdkjfsd","example.com/alsdkjfsd.html"} to Bing's crawler. Bing's crawler would include that in their search index after crawling it. Perhaps the rank of the page in the search results was included as well. This was a way for Bing to discover unindexed pages.
One could argue that that it's scummy but legal, and that the user also technically owns the data of ("what I searched for", "what URL I visited"), rather than Google being the exclusive owner of that data, and could send that to third parties. That could be a reason Google did not take legal action over it.
Google's IE toolbar also sent a ton of data to Google.
This is really bizarre that MS(Bing) can still claim that they don't use Google results with such an investigation and fake results showing identically in both places, having originated on the Google side.
Come on Bing, just own up!
Bing's excuse (at the time in 2011) was that it was the result of Internet Explorer, Bing toolbar, and other user behaviour within their products which seeded this data.
I'm curious if the Google engineers visited any of the 100 search terms using IE or Microsoft service... but I doubt that. There had to be some level of scraping. It's pretty embarrassing.
They also were using autocomplete terms that matched Google's perfectly, which is what initially caught Google's attention.
> I'm curious if the Google engineers visited any of the 100 search terms using IE or Microsoft service... but I doubt that.
From the article:
>We gave 20 of our engineers laptops with a fresh install of Microsoft Windows running Internet Explorer 8 with Bing Toolbar installed. As part of the install process, we opted in to the “Suggested Sites” feature of IE8, and we accepted the default options for the Bing Toolbar.
> We asked these engineers to enter the synthetic queries into the search box on the Google home page, and click on the results, i.e., the results we inserted. We were surprised that within a couple weeks of starting this experiment, our inserted results started appearing in Bing. Below is an example: a search for [hiybbprqag] on Bing returned a page about seating at a theater in Los Angeles. As far as we know, the only connection between the query and result is Google’s result page (shown above).
In essence, sounds like a bit of keylogging, data-sharing with external site(s). From the article, even Google couldn't exactly figure what was going on technically. Makes me wonder what else Microsoft was recording and sharing at the time - or was it just limited to google.com.