ansaurus

Question

Url splitting in php

Answer 1

A:

You have to strip off the subdomain part by yourself - there is no built-in function for this.

// $domain beeing www.w3scools.com
$domain = implode('.', array_slice(explode('.', $domain), -2));

The above example also works for subdomains of a unlimited depth as it'll alwas return the last two domain parts (domain and top-level-domain).

If you only want to strip off www. you can simply do a str_replace(), which will be faster indeed:

$domain = str_replace('www.', '', $domain);

Stefan Gehrig 2009-07-09 08:02:52

Answer 2

+6 A:

Lots of ways you could do this. A simple replace is the fastest if you know you always want to strip off 'www.'

$stripped=str_replace('www.', '', $domain);

A regex replace lets you bind that match to the start of the string:

$stripped=preg_replace('/^www\./', '', $domain);

If it's always the first part of the domain, regardless of whether its www, you could use explode/implode, though it's easy to read it's the most inefficient method:

$parts=explode('.', $domain);
array_shift($parts); //eat first element
$stripped=implode('.', $parts);

A regex acheives the same goal more efficiently:

$stripped=preg_replace('/^\w+\./', '', $domain);

Now you might imagine that this would be more efficient:

$period=strpos($domain, '.');
if ($period!==false)
{
    $stripped=substr($domain,$period+1);
}
else
{
    $stripped=$domain; //there was no period
}

But I benchmarked it and found that over a million iterations, the preg_replace version consistently beat it. Typical results:

Simple str_replace: 1.404s
preg_replace with /^\w+\./: 2.097s
strpos/substr: 2.783s
explode/implode: 3.470s

Paul Dixon 2009-07-09 08:05:52

Nice of you to benchmark!

altermativ 2009-07-09 12:04:51

Answer 3

A:

You need to strip off any characters before the first occurencec of [.] character (along with the [.] itself) if and only if there are more than 1 occurence of [.] in the returned string.

for example if the returned string is www-139.in.ibm.com then the regular expression should be such that it returns in.ibm.com since that would be the domain.

If the returned string is music.domain.com then the regular expression should return domain.com

In rare cases you get to access the site without the prefix of the server that is you can access the site using http://domain.com/pageurl, in this case you would get the domain directly as domain.com, in such case the regex should not strip anything

IMO this should be the pseudo logic of the regex, if you want I can form a regex for you that would include these things.

Rutesh Makhijani 2009-07-09 08:12:18

this what i wanted ... can u help me on how to do it??

Jasim 2009-07-09 08:15:28

Dixons suggestion does that.

Emil H 2009-07-09 08:19:22

Dixon's regex will not work on bare domains. For example, "domain.com" will be turned into "com".Here is another regex that conforms to Rutesh's pseudo logic:`$domain = preg_replace('/^(?(?=[^.]++\.[^.]++\.)[^.]++\.|)/', '', $domain);`

Geert 2009-07-09 08:31:40

This will work for TLDS like .com, .net etc, but how about a domain like geograph.org.uk - you would end up with the invalid org.uk

Paul Dixon 2009-07-09 09:19:50

ansaurus

tags:

views:

answers:

Url splitting in php

related questions