views:

418

answers:

3

I have a URL which can be any of the following formats:

http://example.com
https://example.com
http://example.com/foo
http://example.com/foo/bar
www.example.com
example.com
foo.example.com
www.foo.example.com
foo.bar.example.com
http://foo.bar.example.com/foo/bar
example.net/foo/bar

Essentially, I need to be able to match any normal URL. How can I extract example.com (or .net, whatever the tld happens to be. I need this to work with any TLD.) from all of these via a single regex?

This is in PHP.

Thanks for the help!

+5  A: 

Well you can use parse_url to get the host:

$info = parse_url($url);
$host = $info['host'];

Then, you can do some fancy stuff to get only the TLD and the Host

$host_names = explode(".", $host);
$bottom_host_name = $host_names[count($host_names)-2] . "." . $host_names[count($host_names)-1];

Not very elegant, but should work.


If you want an explanation, here it goes:

First we grab everything between the scheme (http://, etc), by using parse_url's capabilities to... well.... parse URL's. :)

Then we take the host name, and separate it into an array based on where the periods fall, so test.world.hello.myname would become:

array("test", "world", "hello", "myname");

After that, we take the number of elements in the array (4).

Then, we subtract 2 from it to get the second to last string (the hostname, or example, in your example)

Then, we subtract 1 from it to get the last string (because array keys start at 0), also known as the TLD

Then we combine those two parts with a period, and you have your base host name.

Chacha102
What about two-segment top-level domains like `co.uk`?
eyelidlessness
@eyelidlessness Won't work, unfortunately.
Chacha102
A: 

For two-segment top-level domains, you can go through this : link http://stackoverflow.com/questions/1201194/php-getting-domain-name-from-subdomain

CM
A: 

the chosen solution is still not prefect what happen with extensions such as .co.uk .org.il and such ? and when sub domain is somthing like this sub.domain.domain.co.uk ?

this requires a more complicated solution

Exodus
well, provide it!
SilentGhost