ansaurus

Question

What would be the best way to extract the host portion of a url with regexp?

Answer 1

+4 A:

Do you need to use a regex? Most languages have support for parsing URLs. For instance, Java has its java.net.URL, Python has its urlparse module and Ruby has its URI module. You can use these to query different parts of a given URL.

Kevin 2009-02-06 01:13:23

Answer 2

+1 A:

I've tested this in PHP and it works on all of your examples:

/^(ftp:\/\/|https?:\/\/)?(.+@)?([a-zA-Z0-9\.\-]+).*$/

yjerem 2009-02-06 01:22:34

Answer 3

A:

You can check this link: http://regexlib.com/Search.aspx?k=host

Regards

Matias 2009-02-06 01:25:06

Answer 4

+2 A:

Jeremy Ruten's answer is close but will fail if an @ appears anywhere after the hostname. I'd suggest:

(everything that isn't an '@') //optional

(?:[^@:\/]*@)?

The colon and slash prevent matching past the domain if @ appears after the domain. Note the non-capturing parens.

(everything that isn't a '/' up to the first '/' IF it's there) //this is the host group that I want

([^:/]+)

Note the capturing parens.

(everything else that trails) //optional

Since the parens capture the hostname and only the hostname, there's no need to continue matching.

So, putting it all together you get:

/^(?:ftp|https?):\/\/(?:[^@:\/]*@)?([^:\/]+)/

(Note that the first two paren groupings are non-capturing -- hopefully your regex library supports that.)

Neil Mix 2009-02-06 02:05:17

ansaurus

tags:

views:

answers:

What would be the best way to extract the host portion of a url with regexp?

related questions