All these indignant comments but nobody bothers to look up what happened since 2020:
> In a previous blog post, we quantified that upwards of 45.80% of total DNS traffic to the root servers was, at the time, the result of Chromium intranet redirection detection tests. Since then, the Chromium team has redesigned its code to disable the redirection test on Android systems and introduced a multi-state DNS interception policy that supports disabling the redirection test for desktop browsers. This functionality was released mid-November of 2020 for Android systems in Chromium 87 and, quickly thereafter, the root servers experienced a rapid decline of DNS queries.
“I’m the original author of this code, though I no longer maintain it.
”Just want to give folks a heads-up that we’ve been in discussion with various parties about this for some time now, as we agree the negative effects on the root servers are undesirable. So in terms of “do the Chromium folks know/care”; yes, and yes.
”This is a challenging problem to solve, as we’ve repeatedly seen in the past that NXDOMAIN hijacking is pervasive (I can’t speak quantitatively or claim that “exception rather than the norm” is false, but it’s certainly at least very common), and network operators often actively worked around previous attempts to detect it in order to ensure they could hijack without Chromium detecting it.
”We do have some proposed designs to fix; at this point much of the issue is engineering bandwidth. Though I’m sure that if ISPs wanted to cooperate with us on a reliable way to detect NXDOMAIN hijacking, we wouldn’t object. Ideally, we wouldn’t have to frustrate them, nor they us.”
A way for DNS root servers and Chromium devs to cooperate that couldn’t be hijacked by domain redirecting ISPs would be nice.
Fix what, though? This is an insignificant load on a piece of core infrastructure for which all of the load from all of the users of the world's most popular Internet application is still a rounding error error relative to capacity. This is what those servers are there to do!
Those servers are for looking up real domains, they're not designed to be a detection mechanism for malicious actors. Chrome intentionally fires unresolveable uncacheable domains at the root servers. That's dishonest use of public infrastructure at the very least.
I doubt Google would be happy if Microsoft or Apple were to build a network availability tool into their systems that executes random Google queries that won't get you a result to see if searches are being redirected.
Says who? The RFCs disagree: just look at all the resource record types. The DNS standards people have been saying for years that the DNS should also be a secure object storage system, because that's a capability DNSSEC brings. Unlike that purely hypothetical use case, this is real people using the DNS roots for an actually important day-to-day function. There's nothing dishonest about what Chrome is doing: it's sending unresolveable uncachable names specifically to avoid ISP DNS servers caching them and tricking users into believing that NXDOMAINs are clean.
This isn't Google using a competitor's resources. It's Google's users using core Internet infrastructure. It's core infra for a reason. We're not required to use it parsimoniously.
I have the impression that most of the concerned comments on this thread believe that Chrome's NXDOMAIN detection is somehow a burden on the Internet roots. It is not. It's prominent on a graph, and that's all. The Internet roots are, by necessity, designed to handle immense amounts of traffic.
Chrome could just behave like a resolver. Or operating systems could start behaving like resolvers on ::1.
DNS was designed with trusted resolvers in mind (was it?), but we've shifted towards using resolvers of our ISPs and random companies.
I can see the appeal of caching across a WAN, but then again, we don't use HTTP proxies anymore, which provide a similar caching feature. Integrity protection should be placed higher in our priority list than caching effectiveness.
Maybe the impact on root servers created with selfhosted resolvers would decrease as chromium wouldn't have to query random TLDs -- root could just be cached for days.
Actually, thinking about caching root, why don't resolvers already cache root? Chrome wouldn't have such an impact if NSEC from root was remembered by resolvers. DNSSEC solves this problem even nowadays.
If you're suggesting that DNS was designed with endpoint resolvers, rather than upstream caching resolvers, in mind: no, it's the opposite. The DNS architecture assumes that your browser relies on a caching DNS server somewhere.
DNSSEC is hardly deployed at all and doesn't really enter into this story. It doesn't change anything about what Chrome is trying to do, and, were it actually deployed, would dwarf all other causes of increased root server loads by itself.
In almost any other scenario, this traffic would be indistinguishable from a distributed denial of service (DDoS) attack.
Yes, but see, it's not a distributed denial of service attack. It's a productive use of a capability the DNS roots were designed to serve. It's a huge amount of traffic because Chrome is one of the 3 most important and popular applications on the Internet, and the most popular of those, at that. The cart doesn't drag the horse.
And the author's incorrect claim that "interception is the exception rather than the norm" doesn't help. The author tries to compare with Firefox, but Firefox captive portal test uses an existing known URL [1] which has a very different goal from Chromium's NXDOMAIN interception test. In fact I think Firefox would have to implement a similar approach if it were popular enough.
I believe the point they were making is that they could use their own domain and infrastructure for the same kind of testing. Something like "<random-chars>.nxdetect.firefox.com" would keep the responsibility for the feature on Firefox's name servers. The same could be done for Chromium rather than contributing to a tragedy of the commons situation.
There is an obvious flaw which is that this behavior is detectable and avoidable by ISPs that become aware of what that domain is being used for... But that argument applies to the current detection method as well (as this post demonstrates).
> But that argument applies to the current detection method as well (as this post demonstrates).
No, because the post is a post-mortem analysis. If ISPs are going to block this they will have to do so in live, and as presented I don't think they can do that without a disruption.
Sorry, I realized its not directly covered by the post and I read between the lines a little bit. What I read (between the lines) is that there are only a handful of truly valid roots which are all well known. None of the randomly generated Chrome domains will ever resolve to a real address because the random characters are not valid top level domains. A malicious filtering actor could fairly easily change their redirect behavior to only apply to NXDOMAINs issued to valid TLDs and bypass this check.
The original point stands in that this feature abuses a common good at the expense of an operator (requiring 2x the capacity without it by the metrics given in this post) for a niche feature that they could run with equal effectiveness against a domain they control instead.
It's not abusing anything. All it's doing is creating a prominent feature on a graph. The roots are designed to handle this kind of traffic, required to do so, and more than capable of doing so. Application developers are not required to use the DNS parsimoniously, and never have been.
The roots are a "common good" and yes they're designed to handle this level of traffic because they have to. I also agree application developers don't have to be generally restrained in their use of DNS when using the protocol for its intended purposes.
Attempting to look up domains with the intention that they don't exist and are expecting the general case to be a failed query is not using the DNS system as intended. As I mentioned this could be solved by the application including a domain they control so they are not excessively consuming a public good at the expense of others (someone does have to pay for these servers).
The roots should not be expected to double their capacity because one application implements a feature in bad faith.
The roots aren't "doubling their capacity". They're not doing anything resembling doubling their capacity. The only thing that "doubled" is a line on a graph. If your conception of the root servers is that they're generally redlining and we're just scraping by with the capacity we need for current load: no. If we don't have a shared understanding of the facts here, we can't have a productive discussion about this topic.
I think "productive" depends on who's paying the bill and whether the uses of it were how the system was actually designed. I don't think that the roots ever had the idea of serving random requests billions of times because that's the entire point of having the downstream servers and DNS caching. I remember a time when directly pinging the root servers was considered very bad practice because you should always be using the downstream... Google's method essentially makes those downstream servers useless.
All of these norms about DNS queries are deployed situationally. Namedroppers types angry about excess root queries today will defend the notion of end-user operating systems querying the roots directly to facilitate DNSSEC (which without direct recursive lookups from end-systems condenses down to a single "yes I did DNSSEC" bit set by your local DNS server).
Everybody can look at this DNS probe code and kibitz about whether it could have been done more efficiently or whether it should be done at all, but this is a straightforward use of the DNS to do something important for end-users (detect whether their ISP's nameserver is lying to them about NXDOMAIN). Whatever else it is, it isn't a DDOS attack, and whatever else these servers are, they're not quiet out-of-the-way private systems; they're core Internet infrastructure.
People seem to think DNS root queries are answered by monks or something. There are ~1000 anycast sites serving the 13 roots and their load is close enough to zero to be ignored. Even if Chrome causes half of the traffic, who cares?
Your house must look like a hoarder house, then. I mean, there was all that space in there, and space is meant for filling, so why not fill it up? /s
Like any computer system, Root DNS was designed around expectations of typical utilization and expected response time, with margins for burst traffic. Capacity was left unused for a reason.
Google's action here is an example of cost shifting. It doesn't cost Google or the individual user anything to test for a hijacked DNS resolver, because the costs are shifted to the Root DNS. As the article puts it, this cost shifting is indistinguishable from a DDoS because it consumes network capacity and CPU resources and electrical power in an unplanned fashion.
Root DNS was designed around resolving names, not testing for hijacked DNS. And just because you can use DNS to test for hijacked DNS doesn't mean you should.
Root DNS was designed around resolving names, not testing for hijacked DNS.
Says who? You? Have you double checked that belief with the RFCs? Last I looked, there's a lot more than just name->IP in the DNS. "Testing for hijacked DNS" is a DNS function. The root servers are there to make applications work, not the other way around. They're fine, and people watching them don't dictate terms to the browsers.
Well, I just searched RFCs 1034, 1035, 1101, 1183, 1348, 1876, 1982, 1995, 1996, 2065, 2136, 2181, 2137, 2308, 2535, 2673, 2845, 3425, 3658, 4033, 4034, 4035, 4343, 4592, 5936, 5966, 6604, 7766, 8020, 8482, 8490, and 8767 for the word "hijack", and the only reference I can find is rfc2845 §4.4, and that's referencing using TSIG to prevent man-in-the-middle TCP connection-hijacking of a DNS-over-TCP request.
Nothing that I can find there about checking for hijacked DNS. Did you have a specific RFC/section in mind that you were thinking of?
Answering the implied question in the title: ~45% of traffic to root servers is now Chromium users figuring out whether nonexistent domains get correct nxdomain responses.
That's apparently 60 billion queries per day. Doing the math, a query is some 92 bytes on the wire (50 bytes UDP payload, but the root servers are probably also on ethernet or something similar so I'll include headers), so 487 megabits per second per root server is taken up by this, assuming there are no spikes and everything is perfectly averaged. Edit: and that's the downlink. I forgot the uplink traffic that will be larger due to DNS' nature of echoing the query back with the response(s) attached, plus DNSSEC... Woa. /edit.
So what's the next step, Google generously stepping up to running (some of) those root servers and gaining further control and observability over what people do online?
Reading further, Firefox solves this by "namespace probe queries, directing them away from the root servers towards the browser’s infrastructure".
Not only could you do so, if you did not care about reliability or latency you could host the entire DNS root on one desktop computer. Global root DNS traffic is nothing.
Google doesn’t need to offer anything because the servers are holding up just fine under the load. These systems are trivially simple and cheap to operate by global internet standards.
Once again a typical tragedy of the commons problem: some actors (in this case, providers) became too greedy, fucked up service for people, and because of the needed defense against such practices now the commonwealth (in this case, the operators of the root DNS servers) suffers.
On top of that come captive hotspot/corporate portals - and here especially, I do wonder why messing around with DPI is necessary, when every public network has a DHCP server running that could be used to distribute DHCP options for gateway URLs or information if the connection is to be considered metered (mobile phone hotspot) or bandwidth constrained (train or bus hotspots).
If you're going for network login pages, you can intercept any of the standard HTTP URLs. There are specific URLs by Apple, Google, Microsoft, Mozilla, and if you're willing, a bunch of Linux distros as well, that solely serve to detect MitM redirects and show login prompts.
DHCP isn't reliable as many clients don't do anything with advanced settings you provide. Good luck getting a phone to accept the proxy server you've configured over DHCP.
This legitimate-ish DPI usually runs at the network itself, it doesn't traverse the uplink to cause any load.
This isn't a cat-and-mouse game between DPIs and the browser; it's a hamfisted ad-tech tactic deployed by ISPs that straight-up breaks a browser feature, and the browser using the DNS to unbreak it.
60B queries a day is nothing, we're talking about root servers that serve as fundation for the internet as a whole.
The other thing that people would be very suprised about is how old the software is on those root server, forget modern libraries with Rust/c++ and the like, it's pretty old tech that is very inefficient.
Edit: when talking about old tech I'm talking about the architecture of the DNS server used, IO libraries and models, caching, data structure and the like, for example a lot of stuff has been done arround web servers to serve things very efficiently, the same could be done on DNS servers.
> Our measurement based on the micro benchmarks shows that Rust is in general slower than C, but the extent of the slowdown varies across different programs. On average, Rust brings a 1.77x “performance overhead” compared to C. [1]
Of course, for the sake of completeness it should be noted that:
> With the run-time checks disabled and the restrictions loosened, Rust presents a performance indistinguishable from C. [1]
Though I believe my original statement to hold none the less, as disabling these restrictions disables (amongst other things) bound and overflow checking, which is one of if not the major selling point of rust.
As for C++ depends on the features that one uses. If one writes "just what one could do in C" then the machine code produced by the compiler will be exactly (almost) the same. This is due to the fact that many c++ features are only compiler relevant but compile to (almost) the same instructions as code.
However, I would once again raise the question I did above with rust: If we use little to no c++ features then can we distinguish that codebasse from a c codebase in any meaningful way? But assuming we write idomatic code we will have the c++ code behaving somewhat slower due to factors such as:
- automatic collections/object allocation. Datastructures growing "on demand" do in general perform slower than a comparable "none automatic" datastructure increased in larger chunks by hand (using malloc/etc. in C). While this is an implementation detail admittedly, I believe the libstdc++ does not use chunking, though I would not swear on that.
- strings. While no doubt a big upgrade from \0 terminated char sequences idomatic strings in C are less efficient. Especially when it comes to concatenating or manipulation of said strings. In addition it may lead to memory fragmentation, though this should be an afterthought most of the time.
In general the performance difference of C++/C comes down to "hidden" code. While by no means large, assuming software such as the dns root servers which are running essentially 24/7 and will most likely continue doing so for quite a while even small differences in performance will add up.
Admittedly however my original statement of
> Well written C code can easily blow a C++/Rust application out of the water.
May not have been well formulated. It would have been better to split the statement and be more specific about the individual performance differences in regards to rust/c++ instead of bunching them together.
What an arrogant hack. It's hard to believe Google engineers would find that a reasonable approach. If I'd have seen that during a code review, I would have called it out and explained that you shouldn't abuse someone else's systems.
Then again, maybe it's revenge for everyone pinging 8.8.8.8
> What an arrogant hack. It's hard to believe Google engineers would find that a reasonable approach.
Reminds me of the recent "Go module mirror fiasco" where Google found it fair to clone repositories at a rate of ~2,500 per hour in order to essentially proxy Go modules.
That doesn't at all seem like what happened (the changes were in the works prior to the drama), but this is a high-drama tangent unrelated to the story here.
This isn't some tiny authoritative DNS server being flooded unexpectedly with queries. These are the Internet DNS root services. They have to keep up with this kind of traffic. It's their literal job description.
But I dislike the whole GOPROXY design, like it breaking private repos by default and having to set some env variables to make this stupid tool download stuff from server I told it to download.
Well, taking the context into consideration, I'd still say it's too much. Context being they were full git clones and the traffic ended up representing "70% of all outgoing network traffic from git.sr.ht".
This isn't an "abuse". These are the Internet root DNS servers. We don't design applications to tiptoe around them any more than we tiptoe around the capacity of core routers. These is a huge amount of capacity and all of root DNS together is a rounding error on total Internet traffic.
Chromium is imposing the extra load on the system, so yes it is the abuser in that sense. That they are doing it for what is a good reason for the app's users is immaterial to whether the effect is bad for the root DNS servers.
If Karen takes my sandwich from the company fridge, so I take some of Jon's lunch, so I don't starve, I'm not innocent because Karen started it, I've created a situation where there are two arseholes instead of one. This isn't quite what is happening here as the root servers are effectively a public resource and stuff in the fridge is all private resources, but close enough to make the point.
> The networks that hijack DNS request should share some of the blame
They should have all the blame for deliberately breaking part of agreed protocols for their own gain.
But that doesn't make anything we do in response to that right by virtue of us doing it because we have been wronged.
They should, but Google's "solution" to the hijacking is the problem here. If you're going to hijack NXDOMAINs, you can just ignore requests to non-existent TLDs in your scheme and Chrome will be none the wiser.
Google manages entire TLDs, surely they can use their own DNS servers for this purpose.
The problem is, there is no better approach to check for intercepting middleboxes. Using a well-known path (like gstatic.com/generate_204) works for detecting if the user has a captive portal between them and the Internet, but not if the user's provider messes around with DNS.
> there is no better approach to check for intercepting middleboxes
I can see some purposes for detecting middleboxes. I've done it. It usually doesn't involve DNS. It does involve certificate pinning though.
> detecting if the user has a captive portal between them and the Internet
That's easy. Try to browse to something. If succeeds but the certificate isn't valid then the user probably has a captive portal. That, or your pinned certificate has been revoked.
> detecting ... if the user's provider messes around with DNS
Certificate pinning, again, comes to the rescue. Pin a certificate to your own DoH server and then use DoH to look up whatever you need.
If you can't connect to your DoH server then you effectively aren't (or shouldn't be) connected to the internet.
The kind of interception that is being discussed here is the interception of DNS queries which would return an NXDOMAIN result to instead point to an advertising page. This is done by a lot of ISPs.
It is an interesting tricky problem. As the DNS system was designed for caching and an 'honor system'. How do you allow caching so you do not completely overwhelm the root servers yet still prevent modification/interception of the contents of the messages? DoH sort of addresses it. But then you still have to trust that the message has not been altered/intercepted at the DoH server. Yet we also want the server to be able to do a replay of that message to another client. Also how to retrofit all of the existing code that uses DNS that uses this magic system. It is like we need a lightweight one way TLS session but with certificate caching and a second request/response type with a hash. But you have to watch out for cert forging then.
That the browsers are reacting in this way says we have a failure at the DNS software level. Those projects do seem to be giving the security that is wanted. So we are starting to get some fragmentation. Which can not be good. Perhaps we need new record types to support this?
> Perhaps we need new record types to support this?
The solution would have been DNSSEC, the problem is that authenticating NXDOMAIN responses comes with a ton of challenges on its own and so there, in the end, was just workarounds and messy hacks [1] that IIRC no one ended up utilizing.
No, it's not! The problem here is end-users being forced to trust their upstream DNS caches. DNSSEC doesn't change that! It's a server-to-server protocol, so the same NXDOMAIN interception problem exists --- the intercepting DNS server just sets the AD bit in the header. You can run DNSSEC directly from an end-system, bypassing caches --- but then, relieving load on the roots (a non-issue, but still) is right out the window.
They could do DPI, allow traffic for unknown domains, intercept everything else. Differentiating random domain from "what user might actually type by accident" is also pretty trivial
Well .. the root servers and DNS platform in general provide name resolution. Traditionally we think that as a feature/function name -> IP. But there non functional aspects also to consider. Distribution is one (DNS is really good on that), security (DNSSEC comes to mind), multi-transport channel (e.g. DNS over HTTP) are others but like seen in this case there is a non functional need, that interception can be detected. DNS does not deliver that (yet). But it is clearly a need of it in the DNS platform.
So calling it abuse, is wrong. It is a dirty hack for a non-existing feature. It is technical debt of the DNS platform and the root server suffer for it because the ISPs and in-house DNS resolvers create the problem.
Being angry at Google we can anyway be. They have enough money, enough people and enough power to either fix this financially or as a feature within the DNS platform.
Google makes money when people use 8.8.8.8, so there's an even more sinister interpretation. Chrome has all it's users DDoS other DNS servers, to raise the table stakes on running one. Now your infrastructure needs to handle apparently a whole second internet of bogus traffic. Google, of course, has the resources to meet this higher bar.
All of this extra traffic is to support the misfeature where the URL bar and search bar are combined! They are two different functions. When I type a domain I don't want the browser to send that straight to Google and give me a search result page. It makes a little bit of sense on phones where space is tight, but on desktop browsers it's just annoying.
It's an extraordinarily useful feature that millions of users take advantage of every day. The idea that the operators of core Internet infrastructure would be pressuring applications not to have these kinds of features is what should alarm you, not that the features exist. The cart doesn't drag the horse.
Sadly switching back to the old behavior has problems in Firefox. If you go to google.com and start typing in the search box it will redirect to the URL bar, where your search will fail because it is not a search box.
It's the most popular implementation of what is by a very wide margin the most popular and important app on the Internet. What percentage of traffic would you have assumed it represented?
A lot less than that. The point of the hierarchy of DNS caching is to keep traffic at the root (and points of authority) low. I would assume that the vast majority of traffic to/from root DNS servers to be due to access by caching resolver daemons not individual user applications, especially not one particular user application (no matter how popular).
Less than genuine lookups? Even if all the world's DNS traffic came from Chrome, probe lookups are taking 50% of all root server traffic just to confirm NXDOMAIN over and over.
The browser confirms it at startup and when DNS settings change, not "over and over"; the "over and over" comes from the fact that zillions of people use Chrome. Serving the zillions of people who actually use the Internet is the job of the root servers. If they wanted to serve a small number of requests, they could go be the authority server for zombo.com instead.
Yes, but how do I know that there isn't a Cloudflare edge node somewhere in my storage cabinet, announcing anycast routes for 1.0.0.1 and 1.1.1.1 to reduce latency?
I think a couple of ICMP requests from a tiny subset of all internet users every now and then isn't too much strain, but if you want to lighten the load you can try alternating between Google's 8.8.8.8 and Cloudflare's 1.1.1.1 :)
Something that doesn't seem to be directly addressed here is the data-leaking nature of this thing.
I don't know what a typical user of Chromium expects to happen when they type arbitrary strings into the Omnibox, but I can tell you that they probably don't expect the full text to be forwarded to several random, centralized, and highly popular DNS servers that are neither under the control of Google nor their ISPs nor their employers/institutions.
This may have the effect of logging all sorts of stuff that normally Google would be hoovering up, but instead it's in the hands of third parties. Extremely trustworthy parties indeed, but they are also high-value high-stakes targets for anyone who'd want to steal valuable data in the form of query logs.
Obviously there's no good solution to this. I believe Firefox led the charge with the "Awesome Bar" or whatever it was first called, which habituated me and millions of users to type search queries directly into the URL bar.
It's one of those features that's so amazingly useful that it really forces the giants to carefully calculate the tradeoff costs, such as the one discussed in TFA.
Wait, I thought the check here is first done with random strings, and only if those return NXDOMAIN as expected, then the real search entries that might be hostnames are sent to your configured DNS server (and only after pressing enter).
You're correct, that's what this post is about: random strings forwarded to the DNS roots to establish, without leaking any intent to the user's ISP DNS server, whether DNS NXDOMAINs are being intercepted. There's no privacy implication to this beyond the privacy benefit users get from things that push back on interception.
> In a previous blog post, we quantified that upwards of 45.80% of total DNS traffic to the root servers was, at the time, the result of Chromium intranet redirection detection tests. Since then, the Chromium team has redesigned its code to disable the redirection test on Android systems and introduced a multi-state DNS interception policy that supports disabling the redirection test for desktop browsers. This functionality was released mid-November of 2020 for Android systems in Chromium 87 and, quickly thereafter, the root servers experienced a rapid decline of DNS queries.
https://blog.verisign.com/domain-names/chromiums-reduction-o...