tags:

views:

132

answers:

3

I have an url like this:

http://www.w3schools.com/PHP/func_string_str_split.asp

I want to split that url to get the host part only. For that I am using

parse_url($url,PHP_URL_HOST);

it returns www.w3schools.com. I want to get only 'w3schools.com'. is there any function for that or do i have to do it manually?

A: 

You have to strip off the subdomain part by yourself - there is no built-in function for this.

// $domain beeing www.w3scools.com
$domain = implode('.', array_slice(explode('.', $domain), -2));

The above example also works for subdomains of a unlimited depth as it'll alwas return the last two domain parts (domain and top-level-domain).

If you only want to strip off www. you can simply do a str_replace(), which will be faster indeed:

$domain = str_replace('www.', '', $domain);
Stefan Gehrig
+6  A: 

Lots of ways you could do this. A simple replace is the fastest if you know you always want to strip off 'www.'

$stripped=str_replace('www.', '', $domain);

A regex replace lets you bind that match to the start of the string:

$stripped=preg_replace('/^www\./', '', $domain);

If it's always the first part of the domain, regardless of whether its www, you could use explode/implode, though it's easy to read it's the most inefficient method:

$parts=explode('.', $domain);
array_shift($parts); //eat first element
$stripped=implode('.', $parts);

A regex acheives the same goal more efficiently:

$stripped=preg_replace('/^\w+\./', '', $domain);

Now you might imagine that this would be more efficient:

$period=strpos($domain, '.');
if ($period!==false)
{
    $stripped=substr($domain,$period+1);
}
else
{
    $stripped=$domain; //there was no period
}

But I benchmarked it and found that over a million iterations, the preg_replace version consistently beat it. Typical results:

  • Simple str_replace: 1.404s
  • preg_replace with /^\w+\./: 2.097s
  • strpos/substr: 2.783s
  • explode/implode: 3.470s
Paul Dixon
Nice of you to benchmark!
altermativ
A: 

You need to strip off any characters before the first occurencec of [.] character (along with the [.] itself) if and only if there are more than 1 occurence of [.] in the returned string.

for example if the returned string is www-139.in.ibm.com then the regular expression should be such that it returns in.ibm.com since that would be the domain.

If the returned string is music.domain.com then the regular expression should return domain.com

In rare cases you get to access the site without the prefix of the server that is you can access the site using http://domain.com/pageurl, in this case you would get the domain directly as domain.com, in such case the regex should not strip anything

IMO this should be the pseudo logic of the regex, if you want I can form a regex for you that would include these things.

Rutesh Makhijani
this what i wanted ... can u help me on how to do it??
Jasim
Dixons suggestion does that.
Emil H
Dixon's regex will not work on bare domains. For example, "domain.com" will be turned into "com".Here is another regex that conforms to Rutesh's pseudo logic:`$domain = preg_replace('/^(?(?=[^.]++\.[^.]++\.)[^.]++\.|)/', '', $domain);`
Geert
This will work for TLDS like .com, .net etc, but how about a domain like geograph.org.uk - you would end up with the invalid org.uk
Paul Dixon