views:

125

answers:

4

Given this regex:

^((https?|ftp):(\/{2}))?(((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(((([a-zA-Z0-9]+)(\.)*?))(\.)([a-z]{2}
|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum){1})

Reformatted for readability:

@"^((https?|ftp):(\/{2}))?" + // http://, https://, ftp:// - Protocol Optional
@"(" + // Begin URL payload format section
@"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" + // IPv4 Address support
@")|("+ // Delimit supported payload types
@"((([a-zA-Z0-9]+)(\.)*?))(\.)([a-z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum){1}" + // FQDNs
@")"; // End URL payload format section

How can I make it fail (i.e. not match) on this "fail" test case?

http://www.google

As I am specifying {1} on the TLD section, I would think it would fail without the extension. Am I wrong?

Edit: These are my PASS conditions:

  • "http://www.zi255.com?Req=Post&PID=4",
  • "http://www.zi255.com?Req=Post&ID=4",
  • "http://www.zi255.com/?Req=Post&PID=4",
  • "http://www.zi255.com?Req=Post&PostID=4",
  • "http://www.zi255.com/?Req=Post&ID=4"
  • "http://www.zi255.com?Req=Post&Post=4",
  • "http://www.zi255.com?Req=Post&Entry=4",
  • "http://www.zi255.com?PID=4"
  • "http://www.zi255.com/Post.aspx?Req=Post&ID=4",
  • "http://www.zi255.com/Post.aspx?Req=Post&PID=4",
  • "http://www.zi255.com/Post.aspx?Req=Post&Post=4",
  • "http://www.zi255.com/Post.aspx?Req=Post&Title=Random%20Post%20Name"
  • "http://www.zi255.com/?Req=Post&Title=Random%20Post%20Name",
  • "http://www.zi255.com?Req=Post&Title=Random%20Post%20Name",
  • "http://www.zi255.com?Req=Post&PostID=4",
  • "http://www.zi255.com?Req=Post&Post=4",
  • "http://www.zi255.com?Req=Post&Entry=4",
  • "http://www.zi255.com?PID=4"
  • "http://www.zi255.com",
  • "http://www.damnednice.com"

These are my FAIL conditions:

  • "http://.com",
  • "http://.com/",
  • "http:/www.google.com",
  • "http:/www.google.com/",
  • "http://www.google",
  • "http://www.googlecom",
  • "http://www.google.c",
  • ".com",
  • "https://www..."
+2  A: 

You need to force your regex to match up until the end of the string. Add a $ at the very end of it. Otherwise, your regex is probably just matching http://, or something else shorter than your whole string.

Greg Hewgill
tsilb
That wasn't part of your question! I think you need to specify more carefully exactly what you *do* want your regular expression to match, and also just as importantly, what you *don't* want it to match.
Greg Hewgill
Sorry, added my test conditions for clarity.
tsilb
+1 as this did solve the original problem as (poorly) stated.
tsilb
+3  A: 

Sometimes, one catch-all reqex is not the best solution, however tempting. While debugging this regex is feasible (see Greg Hewgills answer), consider doing a couple of tests for different categories of problems, e.g. one test for numerical addresses and one test for named addresses.

Zano
+1  A: 

The "validate a url" problem has been solved* numerous times. I suggest you use the System.Uri class, it validates more cases than you can shake a stick at.

The code Uri uri = new Uri("http://whatever"); throws a UriFormatException if it fails validation. That is probably what you'd want.

*) Or kind of solved. It's actually pretty tricky to define what is a valid url.

Zano
+4  A: 

I'll throw out an alternative suggestion. You may want to use a combination of the parsing of the built-in System.Uri class and a couple targeted regexes (or simple string checks when appropriate).

Example:

string uriString = "...";

Uri uri;
if (!Uri.TryCreate(uriString, UriKind.Absolute, out uri))
{
    // Uri is totally invalid!
}
else
{
    // validate the scheme
    if (!uri.Scheme.Equals("http", StringComparison.OrdinalIgnoreCase))
    {
        // not http!
    }

    // validate the authority ('www.blah.com:1234' portion)
    if (uri.Authority // ...)
    {
    }

    // ...
}
bobbymcr