tags:

views:

383

answers:

5

Hey folks. I'm struggling with a regular expression that will match input in a "single page/page range" text box, meaning the user can enter either a single integer or an integer range in a [lowerBound]-[upperBound] format. For example:


  • 11 : match
  • 2 : match
  • 2-9 : match
  • 2a : not a match
  • 19- : not a match


Is this possible with one regex?

Bonus

  • 9-2 : not a match

Thanks in advance.

+3  A: 

This works for all your test cases:

^\d+(?:-\d+)?$

EDIT: Except the last test case (9-2). Checking that the second value is greater than the first is not something regular expressions are designed to do.

Bryan
+3  A: 
/^(\d+)(-(\d+))?/

if in perl you can just check

if $1 <= $3
+1 for suggesting a way to check the range values are ordered correctly.
Bryan
+2  A: 

This matches a single integer or a range and captures the number(s) as submatches for later use:

/^(\d+)(?:-(\d+))?$/
Helen
+9  A: 

As Bryan says, comparing two numbers is not something regular expressions are designed to do. If you wish to check for the bonus case, you should do so outside the regular expression.

/^(\d+)(?:-(\d+))?$/ && $1 < $2;

That being said, most "regular expression" engines aren't actually regular, so (for example) it is possible in Perl 5:

m{                     # /../ is shorthand for m/../
    \A                 # beginning of string
    (\d+)              # first number
    (?:-               # a non-capturing group starting with '-'...
        (\d+)          #     second number
        (?(?{$1>=$2})  #     if first number is >= second number
            (?!))      #         fail this match
    )?                 # ...this group is optional
    \Z                 # end of string
}x                     # /x tells Perl to allow spaces and comments inside regex

Or /^(\d+)(?:-(\d+)(?:(?{$1>=$2})(?!)))?$/ for short. Tested in Perl 5.6.1, 5.8.8, and 5.10.0.


To match the extended definition of ranges that Lee suggests,

/^\s*
    (\d+) (?:\s*-\s* (\d+))?
    (?:\s*,\s* (\d+) (?:\s*-\s* (\d+))?)*
\s*$/x

Using some Perl 5.10 features, it is even possible to ensure that the numbers are well-ordered:

m{
    \A\s*                              # start of string, spaces
    (?{{$min = 0}})                    # initialize min := 0
    (?&RANGE) (?:\s*,\s* (?&RANGE))*   # a range, many (comma, range)
    \s*\Z                              # spaces, end of string

    (?(DEFINE)                         # define the named groups:
        (?<NUMBER>                     #   number :=
            (\d*)                      #     a sequence of digits
            (?(?{$min < $^N})          #     if this number is greater than min
                (?{{$min = $^N}})      #       then update min
                | (?!)))               #       else fail
        (?<RANGE>                      #   range :=
            (?&NUMBER)                 #     a number
            (?:\s*-\s* (?&NUMBER))?))  #     maybe a hyphen and another number
}x
ephemient
Now why do people say regexes are unreadable? That clearly says "Aardvarks are tasty" in Martian.
Michael Myers
I forgot to write that the first option is much preferred over the second -- I'll say it here in this comment. However, the second option is pretty readable when it's nicely indented and commented. The condensed version is unreadable, but so are many one-liners in any language. /x is good style: in fact, I believe that Perl 6 rules (name chosen as "regex" is no longer truly accurate) are /x by default.
ephemient
While I believe /x originated in Perl, Ruby has the same, Python has re.X=re.VERBOSE, Java has java.util.regex.Pattern.COMMENTS, .NET has System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace, and so on. I don't know why this feature isn't lauded as often as it should be.
ephemient
Outstanding answer, +1 definitely. I'm using this in .Net, and all test cases (Lee's included) work except for one: 1-, 1 -, -1, - 1 all match. But those cases can easily be handled with the server-side code.
AJ
For the record, .Net (powershell):[System.Text.RegularExpressions.Regex]::IsMatch("-1","^\s*(\d*)(?:\s*-\s*(\d*))?(?:\s*,\s*(\d*)(?:\s*-\s*(\d*))?)*\s*$")True
AJ
My mistake. 1-, -1, etc. should be disallowed now.
ephemient
+1  A: 

Since I'm a tester, I was happy to see a list of test cases used as a specification. For completeness, I would add the following test cases:

  • 2 - 9 : match
  • 2- 9 : match
  • 2 -9 : match
  • -1-9 : not a match

Also, even single page or page range is a little simplistic. I would consider supporting these additional test cases:

  • 1,3 : match
  • 1-5,13 : match
  • 1-5,13-23 : match
  • 1,13-23 : match
  • etc
Lee
It took me so long to see all of these excellent responses because I was in a meeting about this very website, where many of your test cases were brought up as possible needs, especially the "#,#-#" scenario. At first I didn't think these would be necessary, but I'm not a tester :-) +1, and thanks.
AJ