tags:

views:

60

answers:

3

I trying to construct a regex to extract a domain given a url .

for eg.

http://www.abc.google.com/
http://abc.google.com/
https://www.abc.google.com/
http://abc.google.com/

should give

abc.google.com

any hellp verymuch appreciated.

Thanks

+1  A: 

Don't know much about ruby but this regex pattern gives you the last 3 parts of the url excluding the trailing slash with a minumum of 2 characters per part.

([\w-]{2,}\.[\w-]{2,}\.[\w-]{2,})/$
Fabian
Should be `([\w-]{2,}\.[\w-]{2,}\.[\w-]{2,})\/$`. +1 though.
Sarfraz
What about (?<=//)[^/]+
SchlaWiener
+5  A: 
URI.parse('http://www.abc.google.com/').host
#=> "www.abc.google.com"

Not a regex, but probably more robust then anything we come up with here.

URI.parse('http://www.abc.google.com/').host.gsub(/^www\./, '')

If you want to remove the www. as well this will work without raising any errors if the www. is not there.

Squeegy
i want to remove the www. too
railscoder
A: 
Jörg W Mittag
i might have framed the qn wrongly. what am trying to do is just remove the leading "http://www." and evering thing after .comso given "http://www.google.com/" should give google.com"http://www.abc.google.com/" should return abc.google.com
railscoder
Why do you want to get abc.google.com for http://abc.google.com/ but google.com for http://www.google.com/ ? What makes the 'www' special? It is just a convention that http-servers usually are on the host named www but it don't have to be that way.
SchlaWiener
yeah. i use a webservice which strips of http and www part of the sitename. to compare the results i need to do the same before doing it
railscoder