I want to match a web address through regex which should capture http://www.google.com as well as www.google.com i.e. with and without protocol.
Exactly the reply (and the link) I would have provided!
Cerebrus
2009-02-20 09:33:18
+1
A:
Read RFC 3986. It is not just as easy as you might think it is. The job is easier if you only have a small set of URLs to parse.
dirkgently
2009-02-20 09:29:20
You can get 'good enough' though, so this answer isn't particularly helpful
John Sheehan
2009-03-09 22:16:08
Its about as good an answer as there was one without full problem specification. To the extent of being a competitor for the top answer. The problem is few people read the RFCs and I having read one and written a IPV6 parser know how hard the job is.
dirkgently
2009-03-10 06:22:38
+2
A:
Well it's going to depend on exactly what you want to capture ("FTP"? "/index.htm"?) because a general URI capture based on the RFC standard is very hard, but you could start with:
/^((https?\:\/\/)?([\w\d\-]+\.){2,}([\w\d]{2,})((\/[\w\d\-\.]+)*(\/[\w\d\-]+\.[\w\d]{3,4}(\?.*)?)?)?)$/
Complicated see?
annakata
2009-02-20 09:34:38
A:
Why not
/google\.com/
?
It catches http://www.google.com , www.google.com , and even google.com for free! :-)
Igor Oks
2009-02-20 10:16:25
It also catches "Well I guess I could try searching for this regex on google.com, nah SO is better than google these days. Hmm, I wonder what's for lunch. Mmmm. Bacon"
annakata
2009-02-20 11:01:12
SO is meant to be a reference so that when you search google, you end up here instead of another crappy site. So this question is fine.
John Sheehan
2009-03-09 22:17:06
@John: Please stop making paranoic comments and donevotes. This was a legitimate answer, advising how to match specific domain names (e.g. google.com).
Igor Oks
2009-03-10 08:46:11
A:
See answers to Getting parts of a URL (Regex). It may provide you with suitable answers.
Note: not an exact duplicate.
strager
2009-03-09 22:17:49