#487 DNS resolution of CNAMEs fails often (not always)

Reporter Ge0rg
Owner Zash
Created
Updated
Stars ★★ (2)
Tags
  • Type-Defect
  • Status-Fixed
  • Priority-Medium
  • Milestone-0.10
  1. Ge0rG on

    Using prosody 0.9.8 on a Debian stable system, S2S connections involving CNAMEs often fail with "DNS resolution failed". Eventually, a link is constructed, though. The problem happens with S2S connections involving CNAMEs, e.g. when joining conversations@conference.siacs.eu: Apr 14 12:30:39 mod_s2s debug First attempt to connect to conference.siacs.eu, starting with SRV lookup... Apr 14 12:30:39 adns debug Records for _xmpp-server._tcp.conference.siacs.eu. not in cache, sending query (thread: 0x2524940)... Apr 14 12:30:39 adns debug Reply for _xmpp-server._tcp.conference.siacs.eu. (thread: 0x2524940) Apr 14 12:30:39 mod_s2s debug conference.siacs.eu has no SRV records, falling back to A/AAAA Apr 14 12:30:39 adns debug Records for conference.siacs.eu not in cache, sending query (thread: 0x4bb08c0)... Apr 14 12:30:39 adns debug Records for conference.siacs.eu not in cache, sending query (thread: 0x83b4ed0)... Apr 14 12:30:39 adns debug Reply for conference.siacs.eu (thread: 0x4bb08c0) Apr 14 12:30:39 mod_s2s debug Looking up jabber.siacs.eu. (DNS depth is 3) Apr 14 12:30:39 adns debug Reply for conference.siacs.eu (thread: 0x83b4ed0) Apr 14 12:30:39 mod_s2s debug DNS lookup failed to get a response for conference.siacs.eu Apr 14 12:30:39 s2sout9b972a0 info Out of connection options, can't connect to conference.siacs.eu Apr 14 12:30:39 mod_s2s debug No other records to try for conference.siacs.eu - destroying Apr 14 12:30:39 s2sout9b972a0 debug Destroying outgoing session yax.im->conference.siacs.eu: DNS resolution failed And a later attempt succeeds: Apr 14 12:34:04 mod_s2s debug First attempt to connect to conference.siacs.eu, starting with SRV lookup... Apr 14 12:34:04 adns debug Records for _xmpp-server._tcp.conference.siacs.eu. not in cache, sending query (thread: 0x8764610)... Apr 14 12:34:04 adns debug Reply for _xmpp-server._tcp.conference.siacs.eu. (thread: 0x8764610) Apr 14 12:34:04 mod_s2s debug conference.siacs.eu has no SRV records, falling back to A/AAAA Apr 14 12:34:04 adns debug Records for conference.siacs.eu not in cache, sending query (thread: 0x87677d0)... Apr 14 12:34:04 adns debug Records for conference.siacs.eu not in cache, sending query (thread: 0x39663e0)... Apr 14 12:34:04 adns debug Reply for conference.siacs.eu (thread: 0x87677d0) Apr 14 12:34:04 mod_s2s debug Looking up jabber.siacs.eu. (DNS depth is 3) Apr 14 12:34:04 mod_s2s debug DNS reply for conference.siacs.eu gives us 137.226.142.20 Apr 14 12:34:04 adns debug Reply for conference.siacs.eu (thread: 0x39663e0) Apr 14 12:34:04 s2sout67711a0 info Beginning new connection attempt to conference.siacs.eu ([137.226.142.20]:5269)

  2. Ge0rG on

    it works, please merge it

  3. MattJ on

    Changes
    • owner MattJ
    • tag Status-Accepted
    • tag Milestone-0.9
  4. MattJ on

    it == http://q.zash.se/7d8b86e6db1d.txt

  5. Zash on

    That patch was commited as http://hg.prosody.im/0.9/rev/45222bfb206f

    Changes
    • tag Status-Started
  6. Zash on

    For the record, the linked commit is a part of the solution but there are still some rare issues that need more investigation.

  7. nicoo on

    I hit one of those edge-cases on Prosody 0.9.8, trying to connect to conference.yunohost.org. The records in the yunohost.org changes, and conference.yunohost.org now has SRV records, but it should be possible to ask its operators for the earlier version of the zonefile.

  8. Ge0rG on

    The problem is still happening on 0.10 (last time around the end of October 2016). Unfortunately it's very hard to time the exact moment and obtain pcaps from it.

  9. crypto orchatno on

    is this complicated by Round Robin or similar strategy?

  10. crypto orchatno on

    clicking star decreased from 3⭐ to 2

  11. MattJ on

    We may consider backporting the changes from 0.10 (see issue #1001) to 0.9.

  12. Kamigishi Rei on

    0.10.0, FreeBSD amd64, getting DNS resolution failures with CNAME records. The DNS software in use is unbound (caching, recursive, and validating resolver). Here's the query dump: reading from file dns.pcap, link-type NULL (BSD loopback) 10.0.0.1.53 > 10.0.0.7.38784: [bad udp cksum 0x14cc -> 0xf434!] 29380 q: SRV? _xmpp-server._tcp.myrl.net. 1/4/0 _xmpp-server._tcp.myrl.net. [10m34s] SRV xmpp.kraut.it.:5269 10 0 ns: myrl.net. [21h29m13s] NS ns2.1984hosting.com., myrl.net. [21h29m13s] NS ns2.1984.is., myrl.net. [21h29m13s] NS ns1.1984.is., myrl.net. [21h29m13s] NS ns0.1984.is. (171) 10.0.0.1.53 > 10.0.0.7.30954: [bad udp cksum 0x14cc -> 0xfcc0!] 48112 q: SRV? _xmpp-server._tcp.myrl.net. 1/4/0 _xmpp-server._tcp.myrl.net. [10m17s] SRV xmpp.kraut.it.:5269 10 0 ns: myrl.net. [21h28m56s] NS ns2.1984hosting.com., myrl.net. [21h28m56s] NS ns2.1984.is., myrl.net. [21h28m56s] NS ns1.1984.is., myrl.net. [21h28m56s] NS ns0.1984.is. (171) 10.0.0.7.25880 > 10.0.0.1.53: [bad udp cksum 0x1440 -> 0x753a!] 33219+ A? xmpp.kraut.it. (31) 10.0.0.1.53 > 10.0.0.7.25880: [bad udp cksum 0x14c8 -> 0x4e90!] 33219 q: A? xmpp.kraut.it. 2/4/0 xmpp.kraut.it. [21h28m56s] CNAME nsdap02.vss.kraut.it., nsdap02.vss.kraut.it. [3m4s] A 212.47.240.51 ns: kraut.it. [21h28m56s] NS ns0.1984.is., kraut.it. [21h28m56s] NS ns2.1984.is., kraut.it. [21h28m56s] NS ns2.1984hosting.com., kraut.it. [21h28m56s] NS ns1.1984.is. (167) 10.0.0.7.25880 > 10.0.0.1.53: [bad udp cksum 0x1440 -> 0xa865!] 13208+ AAAA? xmpp.kraut.it. (31) 10.0.0.1.53 > 10.0.0.7.25880: [bad udp cksum 0x1494 -> 0x7e7d!] 13208 q: AAAA? xmpp.kraut.it. 1/1/0 xmpp.kraut.it. [21h28m56s] CNAME nsdap02.vss.kraut.it. ns: kraut.it. [55m17s] SOA ns0.1984.is. hostmaster.1984.is. 2015092340 86400 7200 1209600 86400 (115) The resolver returns the correct A record in the additional data section. Oct 02 11:16:16 mod_s2s debug Best record found, will connect to xmpp.kraut.it.:5269 Oct 02 11:16:16 adns debug Records for xmpp.kraut.it. not in cache, sending query (thread: 0x004c5258)... Oct 02 11:16:16 adns debug Sending DNS query to 10.0.0.1 Oct 02 11:16:16 adns debug Reply for xmpp.kraut.it. (thread: 0x004c5258) Oct 02 11:16:16 mod_s2s debug DNS lookup failed to get a response for xmpp.kraut.it. It seems like adns fails a lot. This also happens with simple IN A queries without CNAME results.

  13. MattJ on

    @Kamigishi Rei - thanks for your additional data here, sorry it took a while to get back to you. Are you (or anyone) able to share a raw pcap file of the issue? To debug we need the pcap and corresponding debug logs for the query from Prosody.

    Changes
    • tags Milestone-0.11
  14. Kamigishi Rei on

    I still have the pcap file (it only has the DNS traffic since it was taken with a port 53 filter) and the prosody debug log from back then. Would it be all right to edit everything except DNS-related stuff from the log before providing it? The server I had issues with seems to no longer be using SRV records so right now I cannot test it properly unless I find someone else with a similar set-up…

  15. MattJ on

    @Kamigishi Rei that would be fine, thanks. It's really just the DNS requests + responses I'm interested in. Feel free to email the pcap to developers@prosody.im (and reference this issue for context).

  16. Kamigishi Rei on

    All right, just sent the pcap + log files. Note that the pcap was generated on FreeBSD (Microsoft Message Analyser, for one, seems to have issues reading it)

  17. MattJ on

    Thanks, received - the file opens fine with Wireshark. It looks like the domain in question actually has CNAME records listed as targets in the SRV records for the XMPP domain. Note that this is technically forbidden by the specifications, and so a domain configured this way cannot be expected to work reliably in all software. However there are domains configured this way in the real world, and we did add code to Prosody to handle them. Therefore this remains a bug because it looks like at least in this case that code didn't work. I'll work on a test case to try to reproduce the issue locally and hopefully pinpoint the problem.

  18. Zash on

    A reasonable explanation may be that net.dns only remembers answers of the same type as the query, ie a query for A records will only add A answers to the cache. So if the answer consists of CNAME and A, the CNAME gets forgotten and when it peeks into the cache later it sees nothing and assumes the query returned zero answers.

  19. Zash on

    Attempted fix in https://hg.prosody.im/0.10/rev/102242a7ee60 Please test

    Changes
    • tags Milestone-0.10 Status-NeedInfo
  20. Zash on

    Haven't heard anyone having these kinds of DNS issues recently, so I'll assume this go fixed and the issue can be closed.

    Changes
    • owner MattJ Zash
    • tags Status-Fixed

New comment

Not published. Used for spam prevention and optional update notifications.