This function is great, but its main flaw is that it doesn't handle domains ending with .co.uk or .com.au. How can it be modified to handle this?
function parseUrl($url) {
$r = "^(?:(?P<scheme>\w+)://)?";
$r .= "(?:(?P<login>\w+):(?P<pass>\w+)@)?";
$r .= "(?P<host>(?:(?P<subdomain>[-\w\.]+)\.)?" . "(?P<domain>[-\w]+\.(?P<extension>\w+)))";
$r .= "(?::(?P<port>\d+))?";
$r .= "(?P<path>[\w/-]*/(?P<file>[\w-]+(?:\.\w+)?)?)?";
$r .= "(?:\?(?P<arg>[\w=&]+))?";
$r .= "(?:#(?P<anchor>\w+))?";
$r = "!$r!";
preg_match ( $r, $url, $out );
return $out;
}
To clarify my reason for looking for something other than parse_url() is that I want to strip out (possibly multiple) subdomains as well.
Judging by the leading answer so far, there seems to be some confusion about what parse_url does.
print_r(parse_url('sub1.sub2.test.co.uk'));
Results in:
Array(
[scheme] => http
[host] => sub1.sub2.test.co.uk
)
What I want to extract is "test.co.uk" (sans subdomains), so first using parse_url is a pointless extra step where the output is the same as the input.