views:

69

answers:

6

Hello!

I am looking for a regex that validates simple website addresses, i.e.

I need it for contact details, 'Website' field, then when user click it opens IE, it doesn't have to be strict, I just don't want the user to enter 'I love milk' or 'google' etc.

I thought instead shrinking my mind writing my own struggling to find exception, why won't I learn from the community experience, anyone who has a good regex or a link please post.

Thanks a lot.

+1  A: 
https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?

excerpt from http://snipplr.com/view/2371/regex-regular-expression-to-match-a-url/


 (https?://)?([-\w\.]+)+(:\d+)?

revise per suggestion, but i think people should better follow the clue and figure out the answer themselves. anyway, even copy/paste, people should know what they are doing.

Dyno Fu
That doesn't satisfy OP's requirements. He doesn't want to force the 'http' prefix. However drop the https?:// part of it, and it should.
Michael Bray
+6  A: 

From RFC 3986, Uniform Resource Identifiers (URI): Generic Syntax, appendix B (p. 50):

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

If the URI matches this regular expression, it's well formed. The match groups give you the various pieces, which are:

scheme    = $2
authority = $4
path      = $5
query     = $7
fragment  = $9
Wayne Conrad
official reference is the best...
Dyno Fu
A: 

I also mentioned the RFC 3986, but it is a bit too generic, as it is made to match relative URLs as well. Obviously, in the OP case, we want absolute public URLs.

Something like ^(https?://)?(?:[\w.]+)\.(?:[\w:.]+) seems more realistic. I deliberately exclude sites needing username/password pair, but accept port number.
This kind of expression will break when Unicode URLs will be commonplace...

[EDIT] I accepted any scheme, I should restrict more, I suppose, no need to accept ftp addresses or bzr+ssl:// ones for the exposed case...

PhiLho
A: 

Validating or detecting URLs is not as straightforward as it looks. This is a blog post which digs a little deeper into this topic:

http://www.blog.activa.be/2008/10/30/ExtractingURLsNotPerfectButQuotgoodEnoughquot.aspx

and also:

http://www.codinghorror.com/blog/archives/001181.html

Philippe Leybaert
Good pages. Everyone keeps trying to reinvent the RFC. Is it deficient?
Wayne Conrad
A: 

One option without using regex, but address must start with a protocol like http://:

static bool validateAddress(string address)
{
    Uri valid = null;
    return 
        address.StartsWith("http") &&
        Uri.TryCreate(address, UriKind.Absolute, out valid);
}
Rubens Farias
A: 

.. personally.. I tend to look at AddedBytes' regexp cheat sheet

Anders