views:

728

answers:

5

How would one get the root DNS entry from $_SERVER['HTTP_HOST']?

Input:

example.co.uk
www.example.com
blog.example.com
forum.example.co.uk

Output:

example.co.uk
example.com
example.com
example.co.uk

EDIT: Lookup list is very long

A: 

I think that's a bit ill-defined.

You could try doing DNS lookups for each parent record until you find one that doesn't return an A record.

Joshua
Better, keep looking up the DNS tree until you find an SOA record. However, although this sounds like a good idea in theory, it doesn't work in practice (I tried once, and there are just too many broken DNS configurations out there).
Greg Hewgill
A: 

As you've discovered, some countries use a TLD only (example: .tv, .us), others subdivide their country TLD (example: uk).

Ideally, you'll need a lookup list (it won't be long) of approved TLDs, and, if subdivided, the TLD with each subdivision (e.g., ".co.uk" instead of ".uk"). That will tell you which "dots" (from the right) to keep. Then move one dot to the left of that (if found) and chop everything before it.

Without a lookup list, you can exploit the fact that the subdivisions (.co, etc.) are only for countries (which have 2-letter TLDs) and are AFAIK never more than 3 characters themselves and are always letters, so you can probably recognize them with a regex pattern.

Edit: Nevermind, the actual list of public suffixes is much more complex. You're going to need to use a lookup table to figure out what the suffix is, go back to the previous dot, and trim left. RegEx is a poor solution here. Instead, store the list of suffixes in a Dictionary, then test against your domain name, lopping off one dotted portion at a time from the left until you hit a match, then add back the part you just trimmed off.

richardtallent
A: 
/[^\.]+\.[escaped|list|of|domains]$/

I think that should work.

orlandu63
Thats a very big listhttp://mxr.mozilla.org/mozilla-central/source/netwerk/dns/src/effective_tld_names.dat?raw=1
mikeytown2
Then create a tree or whatever they're called.
orlandu63
A: 

Note: as pointed out in the comments, this method doesn't actually work in all cases. The reason for this is that some top-level domains do resolve to IP addresses, even if most do not. Therefore it's not possible to detect if a given name is top-level or pseudo-top-level domain name merely by checking if it has an IP address. Unfortunately, this probably means that the only solution is a lookup list, given how inconsistently treated top-level domains are in practice.

I repeat: do not rely on the code below to work for you. I leave it here for educational purposes only.

There is a way to do this without a lookup list. The list may be unreliable or incomplete, whereas this method is guaranteed to work:

<?php

function get_domain($url) {
    $dots = substr_count($url, '.');
    $domain = '';

    for ($end_pieces = $dots; $end_pieces > 0; $end_pieces--) {
        $test_domain = end(explode('.', $url, $end_pieces));

        if (dns_check_record($test_domain, 'A')) {
            $domain = $test_domain;
            break;
        }
    }

    return $domain;
}

$my_domain = get_domain('www.robknight.org.uk');

echo $my_domain;

?>

In this case, it will output 'robknight.org.uk'. It would work equally well for .com, .edu, .com.au, .ly or whatever other top-level domain you're operating on.

It works by starting from the right and doing a DNS check on the first thing that looks like it might be a viable domain name. In the example above, it starts with 'org.uk', but discovers that this is not an actual domain name, but is a ccTLD. It then moves on to check 'robknight.org.uk', which is valid, and returns that. If the domain name had been, say, 'www.php.net', it would have started by checking 'php.net', which is a valid domain name, and would have returned that immediately without looping. I should also point out that if no valid domain name is found, an empty string ('') will be returned.

This code may be unsuitable for processing a large number of domain names in a short space of time due to the time taken for DNS lookups, but it's perfectly fine for single lookups or code that isn't time-critical.

Rob Knight
Storing the result in a DB would be the way to go; Might want to set type to "A" http://php.net/checkdnsrr.
mikeytown2
Example code doesn't work, but I think you gave me enough to get it working. http://org.uk/ is a real website. Once I get a working solution, this thread will have the working code http://drupal.org/node/567518
mikeytown2
Rob Knight
Have modified the code to check only for 'A' records though.
Rob Knight
Many mistakes in the explanations: 'org.uk' is not a ccTLD, 'org.uk' IS an actual domain name, it is even a non-empty one (there is a SOA record and many NS records)
bortzmeyer
For the purposes of what the questioner is trying to do, as I understand it, what matters is that org.uk does not resolve to an IP address.
Rob Knight
Wrong algorithm, since some TLD have a A record (such as ".dk"). -1
bortzmeyer
Hmm. Good point, I did not know that. Will edit my answer to explain.
Rob Knight
A: 

For this project: http://drupal.org/project/parallel

Usage:

echo parallel_get_domain("www.robknight.org.uk") . "<br>";
echo parallel_get_domain("www.google.com") . "<br>";
echo parallel_get_domain("www.yahoo.com") . "<br>";

Functions:

/**
 * Given host name returns top domain.
 *
 * @param $host
 *   String containing the host name: www.example.com
 *
 * @return string
 *   top domain: example.com
 */
function parallel_get_domain($host) {
  if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN' && strnatcmp(phpversion(),'5.3.0') < 0) {
    // This works 1/2 the time... CNAME doesn't work with nslookup
    for ($end_pieces = substr_count($host, '.'); $end_pieces > 0; $end_pieces--) {
      $test_domain = end(explode('.', $host, $end_pieces));
      if (checkdnsrr($test_domain)) {
          $domain = $test_domain;
          break;
      }
    }
    return isset($domain) ? $domain : FALSE;
  }
  else {
    // This always works
    $sections = explode('.', $host);
    array_unshift($sections, '');
    foreach($sections as $key => $value) {
      $parts[$key] = $value;
      $test_domain = implode('.', parallel_array_xor($parts, $sections));
      if (checkdnsrr($test_domain, 'NS') && !checkdnsrr($test_domain, 'CNAME')) {
        $domain = $test_domain;
        break;
      }
    }
    return isset($domain) ? $domain : FALSE;
  }
}

/**
 * Opposite of array_intersect().
 *
 * @param $array_a
 *   First array
 * @param $array_b
 *   Second array
 *
 * @return array
 */
function parallel_array_xor ($array_a, $array_b) {
  $union_array = array_merge($array_a, $array_b);
  $intersect_array = array_intersect($array_a, $array_b);
  return array_diff($union_array, $intersect_array);
}

/**
 * Win compatible version of checkdnsrr.
 *
 * checkdnsrr() support for Windows by HM2K <php [spat] hm2k.org>
 * http://us2.php.net/manual/en/function.checkdnsrr.php#88301
 *
 * @param $host
 *   String containing host name
 * @param $type
 *   String containing the DNS record type
 *
 * @return bool
 */
function parallel_win_checkdnsrr($host, $type='MX') {
  if (strtoupper(substr(PHP_OS, 0, 3)) != 'WIN') { return FALSE; }
  if (empty($host)) { return FALSE; }
  $types=array('A', 'MX', 'NS', 'SOA', 'PTR', 'CNAME', 'AAAA', 'A6', 'SRV', 'NAPTR', 'TXT', 'ANY');
  if (!in_array($type, $types)) {
    user_error("checkdnsrr() Type '$type' not supported", E_USER_WARNING);
    return FALSE;
  }
  @exec('nslookup -type=' . $type . ' ' . escapeshellcmd($host), $output);

  foreach($output as $line){
    if (preg_match('/^' . $host . '/', $line)) { return TRUE; }
  }
}

// Define checkdnsrr() if it doesn't exist
if (!function_exists('checkdnsrr')) {
  function checkdnsrr($host, $type='MX') {
    return parallel_win_checkdnsrr($host, $type);
  }
}

Output - Windows:

org.uk
google.com
yahoo.com

Output - Linux:

robknight.org.uk
google.com
yahoo.com
mikeytown2