Slow DNS resolving with Linux systems against Windows DNS server
August 1, 2014
In the last days I encountered a problem with the DNS resolution by our Linux systems – must be there for a long time but it took a deep look into a different performance problem to get this one figured out. I did a simple wget to a HTTP site in the same data center and it took sometimes 5 seconds to get DNS name resolved to an IP address. As a network guy I launched tcpdump at once and did see following packets:
10:59:19.264987 IP LinuxClient.51463 > WindowsDnsServer.domain: 57223+ A? xxxx.penz.name. (35)
10:59:19.265056 IP LinuxClient.51463 > WindowsDnsServer.domain: 26702+ AAAA? xxxx.penz.name. (35)
10:59:19.265700 IP WindowsDnsServer.domain > LinuxClient.51463: 26702* 0/1/0 (103)
10:59:24.269981 IP LinuxClient.51463 > WindowsDnsServer.domain: 57223+ A? xxxx.penz.name. (35)
10:59:24.270303 IP WindowsDnsServer.domain > LinuxClient.51463: 57223* 1/0/0 A 10.10.xxx.xxx (51)
10:59:24.270370 IP LinuxClient.51463 > WindowsDnsServer.domain: 26702+ AAAA? xxxx.penz.name. (35)
10:59:24.270557 IP WindowsDnsServer.domain > LinuxClient.51463: 26702* 0/1/0 (103)
As you see the first A query gets not answered but the AAAA does. I changed to an other DNS server (first Windows 2008 R2 and the second Windows 2012 R2) but with the same results. I did tests with RHEL6/Centos6 and Ubuntu 14.04 .. no difference. As a next step I talked with the Windows guys to look at the Windows 2012 R2 DNS server. They did a packet capture and saw that the Windows server did not send that packet, but a DNS Debug log showed that the DNS server it self did answer it. I than called wget with the “–inet4-only” option, which made sure that only a A query was sent and I was not able to reproduce the problem. So it must be something with the second packet.
Getting a tip from a fellow network admin who said I should look at the source port of the packets I did so. The UDP source ports of the A and AAAA were the same and it looked like that the Linux system gets an answer if the A query is answered before the AAAA arrives on the Windows Server. The next step was to look for a way to change that behavior on the Linux side, which looked to me easier than to change something on the Windows site. 😉
Following resolv.conf option looked promising:
single-request-reopen (since glibc 2.9)
The resolver uses the same socket for the A and AAAA requests. Some hardware mistakenly sends back only one reply. When that happens the client system will sit and wait for the second reply. Turning this option on changes this behavior so
that if two requests from the same port are not handled correctly it will close the socket and open a new one before sending the second request.
And yes – that was the solution. On every system I added
options single-request-reopen
to the /etc/resolv.conf
the problem went away. For systems which generate the resolv.conf
automatically (like Ubuntu 14.04), which you can check by
ll /etc/resolv.conf
lrwxrwxrwx 1 root root 29 Mai 26 12:35 /etc/resolv.conf -> ../run/resolvconf/resolv.conf
you should add the line to /etc/resolvconf/resolv.conf.d/base
instead and call sudo resolvconf -u
afterwards.
All together this problem took me many hours to find and I didn’t find anything on the net .. so I thought a post may help other poor admins. 😉
7 Comments »
RSS feed for comments on this post. TrackBack URI
Leave a comment
Powered by WordPress
Entries and comments feeds.
Valid XHTML and CSS.
37 queries. 0.060 seconds.
Hi,
Thank you so much for posting this. I had the same problem, and this helped me solve it.
/Thorbjørn
Comment by Thorbjørn Weidemann — December 19, 2014 #
Hi, i had the same 🙂 this helped and thank you!
Comment by Behavior — April 19, 2015 #
Thank you very much for this post; we were experiencing the same issue and your solution worked.
While this post is all that is needed for those just looking for a solution, I wrote a blog post which goes into more detail (since it’s not reasonable that so many admins waste a day on such a simple problem, right?):
http://philippecloutier.com/blogpost28-dig-1-and-other-DNS-clients-sometimes-taking-5-seconds-to-return-the-results-of-a-local-query
Comment by Filipus Klutiero — November 27, 2015 #
Totally went through this yesterday with our web app. Took a few hours. mysql client on Ubuntu 14 to a remote host. Consistent 5 second delay. Unfortunately found your post after the fact because was googling a little off but it was nice to commiserate. Hopefully my comment will add keywords that will get others here quicker.
Comment by buster — March 4, 2016 #
Nice, exactly the same problem 🙂 and solved.
Comment by pulsar — April 12, 2016 #
Thanks for posting this, the option is what fixed my issue, but I needed to convince NetworkManager on my system to use the option. First I found my connection name, then I specified the options. Restarted NetworkManager and it worked!
nmcli connection show
nmcli con mod “Wired connection 1” \
ipv4.dns-options “single-request-reopen” \
ipv6.dns-options “single-request-reopen”
sudo systemctl restart NetworkManager
Comment by Lee — August 19, 2019 #
Hi,
I got same problem and I did same analysis before arrive here 🙂
My Middleware has old glibc, big problem right now to fix. 🙂
Thanks.
Comment by Phil — May 12, 2022 #