A: 

It could be that the domain simply does not have a MX record. I completely take out the MX entry for my unused / parked domains, it saves my mail server a lot of grief (SPAM).

There really is no need to go past step 2. If the system (or ISP) resolver returned no MX entry, its because it already did the extra steps and found nothing. Or, possibly, the system host resolver is too slow (i.e. from an ISP).

Still, I think its appropriate to just bail out if either happened, as its clearly a DNS or ISP issue, not a problem with the function. Just tell the user that you could not resolve a MX record for the domain, and let them investigate it on their end.

Also, is it feasible to make the resolvers configurable in the application itself, so users could get around a bunky NS?

Tim Post
Well, but I can find a valid/working MX after looping 3 times. In fact, I can rarely get the MX answer on the first try. Don't think of hotmail/gmail e-mail addresses. Think of an e-mail like [email protected] (Moldova) for example...
Malkocoglu
can you set your system resolvers to 4.2.2.1 , 4.2.2.2 and 4.2.2.3 respectively and try again? It could be your local resolver that is problematic
Tim Post
@tinkertim: I sent the DNS query to 4.2.2.1 and it returned the MX record (not the IP but that is not a problem) in the first loop. I think they are all Verizon DNS servers. Can I trust them no matter what ? Isn't it a problem to hardcode one of them in the executable ?
Malkocoglu
As far as I know, they are just public DNS servers. I've been using them for various things for about 4 1/2 years now. As far as hard coding goes, I'm not so sure. However, it looks like you have narrowed down the issue to your system's resolver (likely, your ISP). You might consider raising time outs for those connections, prior to hard coding resolvers.
Tim Post
@tinkertim: I think the problem is the RecursionAvailable flag. I send the DNS query by RecursionDesired=1, this can not force the DNS server to recurse. When the DNS server does the Recursion, it does all the dirty work that I have described above. I think the DNS I got from the DHCP does not do that :-)
Malkocoglu
I tried the same query with 192.33.14.30 (IANA RootZone , should be more reliable/neutral than verizone) but unfortunately it does not recurse. It just returns some NS records :-(
Malkocoglu
The roots just tell you what authoritative name server to ask for the answer that you need. Is it not possible to have resolvers configurable within the application?
Tim Post
It would be too much config for the end-user. They are not IT guys, they may have an e-mail but no knowledge of DNS :-) I will use either the DNS from the DHCP or RootZone/Verizone DNS server to begin my loop and hope to reach the MX at the end...
Malkocoglu
+2  A: 

The authority section in the message, as well as the additional section are optional. Ie, the name servers and their IPs don't have to be in the response to the MX query. It is up to the DNS server to decide to send that extra information even when the server already has the data.

You are stuck having to query for the MX and then query for the IP of the mail server

Gonzalo
+2  A: 

Short answer to your question: RFC 1035 says,

NS records cause both the usual additional section processing to locate a type A record, and, when used in a referral, a special search of the zone in which they reside for glue information.

...the additional records section contains RRs which relate to the query, but are not strictly answers for the question.

...When composing a response, RRs which are to be inserted in the additional section, but duplicate RRs in the answer or authority sections, may be omitted from the additional section.

So the bottom line in my opinion is that, yes, if the response does not contain the A record matching the NS record it some section, something is likely misconfigured somewhere. But, as the old dodge goes, "be liberal in what you accept;" if you are going to make the queries, you will need to handle situations like this. DNS is awash in these kinds of problems.

The longer answer requires a question: how are you getting the original DNS server where you are starting the MX lookup?

What you are doing is a non-recursive query: if the first server you query does not know the answer, it points you at another server that is "closer" in the DNS hierarchy to the domain you are looking for, and you have to make the subsequent queries to find the MX record. If you are starting your query at one of the root servers, I think you will have to follow the NS pointers yourself like you are.

However, if the starting DNS server is configured in your application (i.e. a manual configuration item or via DHCP), then you should be able to make a recursive request, using the Recusion Desired flag, which will push the repeated lookup off onto the configured DNS server. In that case you would just get the MX record value in your first response. On the other hand, recursive queries are optional, and your local DNS server may not support them (which would be bizarre since, historically, many client libraries relied on recursive lookups).

In any case, I would personally like to thank you for looking MX records. I have had to deal with systems that wanted to send mail but could not do the DNS lookups, and the number and variety of bizarre and unpleasant hacks they have used has left me with emotional scars.

Tommy McGuire