This tool is so neat! One thing I've learned from it is my ISP (sonic.net) seems to be doing queries to _.example.com. For instance:
$ dig @50.0.1.1 nelson.lily6.messwithdns.com a
Results in two queries being answered by the messwithdns server. One for nelson.lily6.messwithdns.com as expected, but also one for _.lily6.messwithdns.com.
Any guesses what that naked underscore query is for? Not every nameserver does it (Cloudflare, Google, Quad9, and Adguard all don't). But Sonic isn't the only one that does.
I've asked on Twitter and the best guess right now is it has something to do with RFC2782 or RFC 8552. But those are about using _ to make unique tokens that aren't likely domain names, things like _tcp or _udp. What would a naked _ mean?
I wrote it because I wanted more specific advice about how qname minimization should work, and I deliberately aimed it at an ideal world, ignoring obvious interoperability problems. I hoped that this would provoke discussion and get people working towards a more realistic algorithm. But that did not happen until years later.
So the early implementations of qname minimization had to invent their own ways of working around the inevitable interop problems, and some of those solutions were quite creative.
I think the bare _ version is trying to avoid querying delegation points directly, so that it still gets a referral as it would have done using the full qname. And the _ also avoids problems with negative responses, which are often implemented very badly - it is common to make a mess of the distinction between NXDOMAIN and NODATA.
After reading through the draft I think I don't understand the argument about user privacy.
Does QNAME minimization try to prevent the scenario where a malicious party has setup a DNS tracker that responds with the same A/AAAA entries for a specific subdomain in the sense that e.g. "session-id.actualserver.company.tld" results in the same entries as "actualserver.company.tld"?
How would a client detect this before actually resolving it? I mean, if TTL is 0, no client will cache the results and therefore the minimization aspects are kind of irrelevant because the client has to resolve all over again, right?
I think I am having questions about the logical conditions "when" a client tries to resolve "_" before resolving the actual domain, which I am assuming is what the draft proposed...because to me this scenario would have the requirement that the very same party also has ownership of the HTML/actual links in the code, so I don't understand what it's trying to prevent because the same party could just read their apache logs to gain better datasets.
The scenario is that you want to resolve alice.example.com but you don't want the root servers or the .com servers to know any more information than they need to.
Historically you would send the whole query to all servers. Even the root servers would see the entire fully-qualified domain name (alice.example.com) even though all they're going to do is refer you to the .com servers. With QNAME minimization the root servers only know that you want something under .com and the .com servers only know you want something under .example.com and so on.
Now suppose the root servers don't do any kind of encryption but example.com supports DNSCurve or some other opportunistic encryption and so do you. Your ISP used to see the query going to the root servers or the .com servers and know the FQDN even if the query to example.com was encrypted. Now they don't.
Likewise, if someone is sitting on the root servers watching all the queries from everyone, they used to see FQDNs, now they only see top level domains.
I didn't have in mind that an ISP could have their own map of all zones where they simply map observed specific DNS traffic to the zones themselves because they know which server is responsible as well.
> Your ISP used to see the query going to the root servers or the .com servers and know the FQDN even if the query to example.com was encrypted. Now they don't.
In practice your recursive resolver either is your ISP (in which case this helps nothing) or is outside of your ISP (and your ISP can't see its queries). The only realistic privacy leaks that is addresses is leaking subdomains to the root servers and other delegating servers higher up the chain an their network operators.
As others answered, something called qname minimization. Others gave detailed explanations, so I'll try to be shorter.
In DNS, the recursive resolver sends the entire FQDN each time to every step.
Now realize, like every company, DNS operators want to collect and sell your data.
So imagine a 'bigsite.com' that does a lot of things. And you like, say porn.bigsite.com. Without this minimization, everyone from the root to verisign to bigsite knows what you queried for.
$ dig @50.0.1.1 nelson.lily6.messwithdns.com a
Results in two queries being answered by the messwithdns server. One for nelson.lily6.messwithdns.com as expected, but also one for _.lily6.messwithdns.com.
Any guesses what that naked underscore query is for? Not every nameserver does it (Cloudflare, Google, Quad9, and Adguard all don't). But Sonic isn't the only one that does.
I've asked on Twitter and the best guess right now is it has something to do with RFC2782 or RFC 8552. But those are about using _ to make unique tokens that aren't likely domain names, things like _tcp or _udp. What would a naked _ mean?