Ah - if you just want to handle three character top level domains - then this code works:
<?php
// let's test the code works: these should all return
// example.com , example.net or example.org
$domains=Array('here.example.com',
'example.com',
'example.org',
'here.example.org',
'example.com/ignorethis',
'example.net/',
'http://here.example.org/longtest?string=here');
foreach ($domains as $domain) {
testdomain($domain);
}
function testdomain($url) {
if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.([A-Za-z]{3})(\/.*)?$/',$url,$matches)) {
print 'Domain is: '.$matches[3].'.'.$matches[4].'<br>'."\n";
} else {
print 'Domain not found in '.$url.'<br>'."\n";
}
}
?>
$matches[1]/$matches[2] will contain any subdomain and/or protocol, $matches[3] contains the domain name, $matches[4] the top level domain and $matches[5] contains any other URL path information.
To match most common top level domains you could try changing it to:
if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.([A-Za-z]{2,6})(\/.*)?$/',$url,$matches)) {
Or to get it coping with everything:
if (preg_match('/^((.+)\.)?([A-Za-z][0-9A-Za-z\-]{1,63})\.(co\.uk|me\.uk|org\.uk|com|org|net|int|eu)(\/.*)?$/',$url,$matches)) {
etc etc