tags:

views:

610

answers:

5

I want to match a web address through regex which should capture http://www.google.com as well as www.google.com i.e. with and without protocol.

+2  A: 

Try RegexLib.

Mitch Wheat
Exactly the reply (and the link) I would have provided!
Cerebrus
+1  A: 

Read RFC 3986. It is not just as easy as you might think it is. The job is easier if you only have a small set of URLs to parse.

dirkgently
You can get 'good enough' though, so this answer isn't particularly helpful
John Sheehan
Its about as good an answer as there was one without full problem specification. To the extent of being a competitor for the top answer. The problem is few people read the RFCs and I having read one and written a IPV6 parser know how hard the job is.
dirkgently
+2  A: 

Well it's going to depend on exactly what you want to capture ("FTP"? "/index.htm"?) because a general URI capture based on the RFC standard is very hard, but you could start with:

/^((https?\:\/\/)?([\w\d\-]+\.){2,}([\w\d]{2,})((\/[\w\d\-\.]+)*(\/[\w\d\-]+\.[\w\d]{3,4}(\?.*)?)?)?)$/

Complicated see?

annakata
A: 

Why not

/google\.com/

?

It catches http://www.google.com , www.google.com , and even google.com for free! :-)

Igor Oks
It also catches "Well I guess I could try searching for this regex on google.com, nah SO is better than google these days. Hmm, I wonder what's for lunch. Mmmm. Bacon"
annakata
Which, if you enter it in most browsers will bring you to google :)
MSalters
SO is meant to be a reference so that when you search google, you end up here instead of another crappy site. So this question is fine.
John Sheehan
@John: Please stop making paranoic comments and donevotes. This was a legitimate answer, advising how to match specific domain names (e.g. google.com).
Igor Oks
A: 

See answers to Getting parts of a URL (Regex). It may provide you with suitable answers.

Note: not an exact duplicate.

strager