tags:

views:

302

answers:

4

I'm extracting the host from my url and am getting jammed up by making the last / optional. the regexp needs to be prepared to receive the following:

http://a.b.com:8080/some/path/file.txt
or
ftp://a.b.com:8080/some/path
or
ftp://[email protected]/some/path
or
http://a.b.com
or 
a.b.com/some/path

and return a.b.com

so...

(ftp://|http://)? optionally matches the first part
then it gets hairy...
so... without adding ugly (and wrong) regexp here... just in english
(everything that isn't an '@') //optional
(everything that isn't a '/' up to the first '/' IF it's there) //this is the host group that I want
(everything else that trails) //optional
+4  A: 

Do you need to use a regex? Most languages have support for parsing URLs. For instance, Java has its java.net.URL, Python has its urlparse module and Ruby has its URI module. You can use these to query different parts of a given URL.

Kevin
+1  A: 

I've tested this in PHP and it works on all of your examples:

/^(ftp:\/\/|https?:\/\/)?(.+@)?([a-zA-Z0-9\.\-]+).*$/
yjerem
A: 

You can check this link: http://regexlib.com/Search.aspx?k=host

Regards

Matias
+2  A: 

Jeremy Ruten's answer is close but will fail if an @ appears anywhere after the hostname. I'd suggest:

(everything that isn't an '@') //optional

(?:[^@:\/]*@)?

The colon and slash prevent matching past the domain if @ appears after the domain. Note the non-capturing parens.

(everything that isn't a '/' up to the first '/' IF it's there) //this is the host group that I want

([^:/]+)

Note the capturing parens.

(everything else that trails) //optional

Since the parens capture the hostname and only the hostname, there's no need to continue matching.

So, putting it all together you get:

/^(?:ftp|https?):\/\/(?:[^@:\/]*@)?([^:\/]+)/

(Note that the first two paren groupings are non-capturing -- hopefully your regex library supports that.)

Neil Mix