views:

67

answers:

3

I'm trying to extract the subdomain from the HTTP_HOST value. However I've stumbled into a problem where if the subdomain has more than one dot in it it fails to match properly. Given that this is a script to run on multiple different domains and it could have an unlimited amount of dots, and the tld could be either 1 or 2 parts (and any length) - is there a practical way of correctly matching the subdomain, domain and tld in all situations?

So for example take the following HTTP_HOST values and what is required to be matched.

  • www.buggedcom.co.uk
    • Subdomain: www
    • Domain: buggedcom.co.uk
    • TLD: co.uk
  • www.buggedcom.com
    • Subdomain: www
    • Domain: buggedcom.com
    • TLD: com
  • test.buggedcom.co.uk
    • Subdomain: test
    • Domain: buggedcom.co.uk
    • TLD: co.uk
  • test.buggedcom.com
    • Subdomain: test
    • Domain: buggedcom.com
    • TLD: com
  • multi.sub.test.buggedcom.co.uk
    • Subdomain: multi.sub.test
    • Domain: buggedcom.co.uk
    • TLD: co.uk
  • multi.sub.test.buggedcom.com
    • Subdomain: multi.sub.test
    • Domain: buggedcom.com
    • TLD: com

I am presuming that the only way to accomplish this would be to load a list of tlds, which allow possible I don't really want to do as this is at the start of a script and should really require heavy lifting like that.

Below is the current code.

define('HOST', isset($_SERVER['HTTP_HOST']) === true ? $_SERVER['HTTP_HOST'] : (isset($_SERVER['SERVER_ADDR']) === true ? $_SERVER['SERVER_ADDR'] : $_SERVER['SERVER_NAME']));
$domain_parts = explode('.', HOST); 
$domain_parts_count = count($domain_parts);
if($domain_parts_count > 1)
{   
    $sub_parts = array_splice($domain_parts, 0, $domain_parts_count-3);
    define('SUBDOMAIN', implode('.', $sub_parts));
    unset($sub_parts);
}
else
{
    define('SUBDOMAIN', '');
}
define('DOMAIN', implode('.', $domain_parts));
var_dump($domain_parts, SUBDOMAIN, DOMAIN);exit;

Just thought could mod_rewrite append the subdomain as a get param?

A: 

With preg_match, you can extract the subdomain and tld parts in one go, like this:

function get_domain_parts($domain) {
    $parts = array();
    $pattern = "/(.*)\.buggedcom\.(.*)/";
    if (preg_match($pattern, $domain, $parts) == 1) {
        return array($parts[1], $parts[2]);
    } else {
        return FALSE;
    }
}

$result = get_domain_parts("multi.sub.test.buggedcom.co.uk");
if ($result) {
    echo($result[0] . " and " . $result[1]); // multi.sub.test and co.uk   
}
André Laszlo
because this won't be run on a definitive domain so I can't check against anything. Also it's run before configuration loads in the base url for various optimization/caching reasons.
buggedcom
oic, I guess you'll have go go with evolve's solution then :)
André Laszlo
A: 

First of all I would explode(and use the first index in the array) on a slash just to be sure that the string ends with the TLD.

Then I would cut it with a preg_replace. This rexexp matches the domain+tld regardless of tld type. Beware however this would give a problem with 2&3 letter domains. But it should give a push to the right direction....

[a-zA-Z0-9]+\.(([a-zA-Z]{2,6})|([a-zA-Z]{2,3}\.[a-zA-Z]{2,3}))$

Edit: as pointed out: .museum is also possible, so edited the first pattern in the TLD part....

And of course TLD's like .UK could behave differently then co.uk ugh.. it's not that easy...

Deefjuh
Ouch. You don't think .info, .museum etc have a right to exist? :)
bzlm
ouch, you are totally right.
Deefjuh
A: 

Not to be nit-picky, but technically speaking .co.uk is a second level domain.

.uk is the "Country Code Top Level Domain" in that case, and the .co is for "Commercial Use" defined by the United Kingdom.

This might not answer your question though.

Wikipedia has a pretty complete list of TLD's, as you can see they only contain 1 "dot" followed by 1 "string".

evolve
I think it answers the question. Just not in the way the OP had hoped. :)
bzlm
Oh yeah. I did know that, sorry incorrect example.
buggedcom