Perhaps surprisingly to the HN community, a lot of what is coming out confirms what decent SEOs have been saying for years (and appears to contradict a lot of what Google have said publicly via their various talking heads)
Long time dev/designer and part-time SEO person here.
The major implications are that a lot of what Google was telling SEO practitioners were a lie or cleverly disguised red herrings wrapped up in semantics.
The article that was posted earlier today went through a lot of the stuff they were lying about with examples of Matt Cutts and Gary Ilyes statements that are now directly contradicted with the recent release of this code.
What I’ll do here is contextualize some of the most interesting ranking systems and features (at least, those I was able to find in the first few hours of reviewing this massive leak) based on my extensive research and things that Google has told/lied to us about over the years.
“Lied” is harsh, but it’s the only accurate word to use here. While I don’t necessarily fault Google’s public representatives for protecting their proprietary information, I do take issue with their efforts to actively discredit people in the marketing, tech, and journalism worlds who have presented reproducible discoveries.
Makes sense. I used to have a youtube channel that got a few age-restricted videos (for absolutely silly reasons). Reps from youtube would swear up and down that age-restrictions wouldn't effect your video's ranking in the algorithms but it was so well known in most creator circles how untrue that was.
Thank you for that link, I'll have to check it out!
Normally I’m not here to defend the likes of Google, but when it comes to their communications to SEO practitioners, I’m OK with some intentional misdirection and outright lies.
The "lies" there are such bullshit. They basically consist of "Google claimed they didn't do X" now it looks like they probably do X and therefore google lied.
They are somewhat circumspect about the fact that each example is actually, 8+ years ago Google made a claim about what they did at the time and now there's evidence that they are currently doing doesn't match that claim. That's not a lie.
A lot more spam than usual in your search results.
A leak like this (if it’s actually as substantial as people say it is) combined with google’s ham fisted way to insert LLM results into its search could really mean that google search quality will crater in the coming months, potentially opening up space for competitors or reducing usage in a real way.
Since this seems to be most relevant to SEO practitioners I might as well ask a loaded question. Given how prevalent spam sites are in search results now and how bad Google seems to be at filtering them out, what's the argument for how SEO is a good thing in aggregate?
Are you asking whether people want discoverability for their product/companies ?
Or if they're willing to give up agency and leave it to a third party for-profit corporation to dictate what is surfaced in people's generic searches ?
I'm asking if optimizing for clicks in search results rather than optimizing for quality of content and letting search engines attempt to determine what's worth surfacing was overall a good thing for users.
That's a valid question, but there is no answer to what "optimizing for quality of content" means, and every entity involved has a different view of it.
We'd have to decide what is quality and what is content first, and I'd expect the heat death of the universe before we ever come to a useful definition that even matches half people's own definitions.
Depends on the context. I'm a long time dev/designer and part-time SEO person.
Back in the day (early to mid aughts) as an SEO person you really had to work to get your site noticed. You have to develop inbound links, create really good content, set your site up well for Google to index, used other resources like blogs and social media (when it was still in the early stages) to boost your positioning. In short, it was a full time, month-to-month job. The sites I worked on it was a constant battle to get a decent ranking and to maintain it.
But you are 100% right. Somewhere in the last 8-10 years, everything has gone from search engines really protecting who they put on the front page, to loading the page with tons of ads (half of the results above the fold are now all ads, followed by more ads below the 10th position) and making it insanely easy to manipulate Google and others to get your site on the front page.
Great example:
In 2019, I had a medical device company that was a startup. Out of the blue they called me because of my prior relationship with one of the founders parents and the site I built for her. They needed a site designed, built, optimized and 15 pages of content written in less than three months. I gave them some insane price and they didn't even balked and said if I got done sooner, they would send me a bonus.
I grabbed a template somewhere online, revamped the home page and the internal content pages and then proceeded to copy/paste content from other well known and not so well known sites in their industry. I did everything you should never do from an SEO standpoint. I figured it was a long shot, but worth the payoff. I got their site released on time, but the majority of the content was copied from other sites. Very little of it was original.
I figured it would get buried in the first month. Nope. #3 on the first page for various searches I targeted. It was outranking the sites I copied the content from, it was crazy. To this day, the site remains in the top three places on the first page of Google and other search engines for dozens of searches I targeted. It was outranking huge medical supply companies in the same industry - all as a startup.
That experience in 2019 was a huge wakeup call for me. It was blatantly obvious how easy it was to manipulate Google and other search engines. As of today, I have no idea if SEO is really a worthwhile pursuit any more considering how easy it is to do this. If I can do it, then I just assume everybody else is already doing it. If not, then they're missing out.
> Though the documents aren’t exactly a smoking gun
Smoking gun of what? The article never seems to actually say.
The only accusation I see is that SEO grifters were lied to by Google Search docs, but you look at the article like[1] it's over things like "Google always said Domain Authority isn't a thing, but we found a variable site_authority. We don't know what it's for or what it does, but clearly we were right". Which, even if true, isn't a thing I care about at all...
I believe for a while that Google didn’t consider Domain Authority much.
For a while, I would create pages like “Bank of XYZ Phone Numbers” or “Phone Numbers for $ThatTelecomYouHate” and often rank #1.
Which made sense, because my page was just a very parseable page of their actual phone numbers, but obviously risk for harm there.
Phone numbers on the actual orgs’ websites were hard to find because, well, they wanted you to do anything but call them because calling cost them money. And the big corp websites were always an SEO mess that I’m not surprised a crawler could not comprehend.
Of course the ads were all for competitors, so I enjoyed making money and costing them customers at the same time.
After some years, Google would start returning you 10 different results from the orgs’ websites, playing into their hand of having you do anything but call your bank/telco.
Publicly lying about how a core product works is notable, and would be a big deal for any other company. Reminds me of Apple's "we treat all developers equally" statements before it was made clear Spotify didn't pay a dime on their in-app subs.
You're right that the real impact is limited though, people woun't be moving to another search engine anyway.
SEO discussion is notoriously high noise and low signal. It's mainly conjecture, assumption, and superstition. Like a lot of things, if they knew what worked, they wouldn't be sharing it.
That tells me literally nothing. This is all machine generated boilerplate code. This isn't executive emails or documents. Since when are comments written by nameless workers an authoritative newsworthy leak on corporate policy? It probably tells us something about what sort of things workers at Google have researched, e.g. csam, scams, racism. If I grep for SEO then I see some database fields for their pagespeed service that mentions accessibility audits and progressive web apps. Looks pretty consistent with what Google has always said publicly. Where's the specific code that's generating all the shock and outrage?
There are plenty of breakdowns of what exactly the technical documents contain, including the stuff I just said. In great detail! Google ironically here is your friend. It confirms what SEO practitioners have been saying for years that Google has actively been discrediting with flat out falsehoods, such as they don’t use domain authority anywhere in their ranking (documentation describes fields named domainAuthority), they track clicks in great detail (which they have vigorously denied), and they use chrome usage data as a factor in their rankings. Hope that helps
Regarding trade secrets, inadvertent or accidental disclosure means it is no longer a trade secret - no protection. And regarding copyright, there is Oracle vs. Google that copying APIs is fair use (Google's argument). So even if the copyright license grant in the leak is held to be invalid (the license states it is perpetual, if it is valid it does forever carry that license), it would be fair use anyway. I think the only IP protection is patents, Google has patents on various parts of the system so if you did actually want to build a Google clone by reverse-engineering their documentation you would run into trouble.
I think serving the docs themselves is likely illegal, though. You don't actually have a right to distribute copies just because someone pushed a wrong button and they were made visible in a repo that has a top-level Apache 2 license.
Sure, they can't stop people who received a copy from studying it and using or discussing the information in there. But I think they can almost certainly stop them from (legally) distributing that information to others.
Apache 2 is irrevocable, and you can redistribute it however you wish. I think it's safe to assume as soon as it was pushed public to the original google repo it's was covered under apache 2. that entire repo was distributed by google as Apache 2. So if you pull that commit, as far as I would be concerned everything in it is apache 2 since that's the license it's distributed with. We have no idea why that code was pushed, and it doesn't matter from a public perspective.
None of that matters if the company never intended to publish it as Apache 2, and can show they took immediate steps to correct their mistake. The law is not code, and intent matters.
source or relevant case? Because if someone pulls before it's another commit that removes it... you would have absolutely zero idea why they removed it, or you wouldn't even know it was removed. Or even if you still pull that commit that as of yesterday after over 1 month was still in the google repo, there's no info that it shouldn't be there... And I think pushing code to a public opensource repo, shows intent to publish it...
I don't think this leak shows that they use Elixir. It's open-source code that Google customers are supposed to be able to use to interact with Google APIs. It's natural to provide support for your customers to use the languages they want to use, rather than the ones you've settled on internally; this is important for anyone, let alone restrictive internal environments like Google's is supposed to be. (https://aws.amazon.com/sdk-for-php/ exists, but Amazon famously doesn't allow services to be developed in PHP.) Google is famous for allowing only a few languages for production code.
The README also says:
> Disclaimer
> This is not an officially supported Google product.
which, while it doesn't directly indicate that Elixir isn't used internally to Google, would be a surprising mismatch of support if they did use Elixir. Here are their official clients (including some in languages that I doubt they want to be used internally for service development, like PHP, Node.js, and .NET): https://developers.google.com/api-client-library
That statement about not being a supported Google product is just boilerplate that they make you put on Google open source products. I wouldn’t read much into it. I say this as an ex-Googler who open sourced several projects and had to put that disclaimer on them.
Maybe I'm missing something but all the document seems to imply is that Google collects this data... Nothing about it being used as ranking factors. It doesn't seem unreasonable for Google to collect and correlate extra data that it may want to use as future ranking factors, experiments, quality analysis etc.
Ouch. Some pretty damming stuff there. I can see spokespeople getting something wrong, but multiple points that are seen as highly contentious in SEO land being explicitly denied and the docs say opposite? Not much wiggle room there
https://sparktoro.com/blog/an-anonymous-source-shared-thousa...