ansaurus

Question

Regular Expression to match a single page or page range?

Answer 1

+3 A:

This works for all your test cases:

^\d+(?:-\d+)?$

EDIT: Except the last test case (9-2). Checking that the second value is greater than the first is not something regular expressions are designed to do.

Bryan 2009-06-10 19:27:00

Answer 2

+3 A:

/^(\d+)(-(\d+))?/

if in perl you can just check

if $1 <= $3

2009-06-10 19:28:04

+1 for suggesting a way to check the range values are ordered correctly.

Bryan 2009-06-10 19:36:54

Answer 3

+2 A:

This matches a single integer or a range and captures the number(s) as submatches for later use:

/^(\d+)(?:-(\d+))?$/

Helen 2009-06-10 19:40:48

Answer 4

+9 A:

As Bryan says, comparing two numbers is not something regular expressions are designed to do. If you wish to check for the bonus case, you should do so outside the regular expression.

/^(\d+)(?:-(\d+))?$/ && $1 < $2;

That being said, most "regular expression" engines aren't actually regular, so (for example) it is possible in Perl 5:

m{                     # /../ is shorthand for m/../
    \A                 # beginning of string
    (\d+)              # first number
    (?:-               # a non-capturing group starting with '-'...
        (\d+)          #     second number
        (?(?{$1>=$2})  #     if first number is >= second number
            (?!))      #         fail this match
    )?                 # ...this group is optional
    \Z                 # end of string
}x                     # /x tells Perl to allow spaces and comments inside regex

Or /^(\d+)(?:-(\d+)(?:(?{$1>=$2})(?!)))?$/ for short. Tested in Perl 5.6.1, 5.8.8, and 5.10.0.

To match the extended definition of ranges that Lee suggests,

/^\s*
    (\d+) (?:\s*-\s* (\d+))?
    (?:\s*,\s* (\d+) (?:\s*-\s* (\d+))?)*
\s*$/x

Using some Perl 5.10 features, it is even possible to ensure that the numbers are well-ordered:

m{
    \A\s*                              # start of string, spaces
    (?{{$min = 0}})                    # initialize min := 0
    (?&RANGE) (?:\s*,\s* (?&RANGE))*   # a range, many (comma, range)
    \s*\Z                              # spaces, end of string

    (?(DEFINE)                         # define the named groups:
        (?<NUMBER>                     #   number :=
            (\d*)                      #     a sequence of digits
            (?(?{$min < $^N})          #     if this number is greater than min
                (?{{$min = $^N}})      #       then update min
                | (?!)))               #       else fail
        (?<RANGE>                      #   range :=
            (?&NUMBER)                 #     a number
            (?:\s*-\s* (?&NUMBER))?))  #     maybe a hyphen and another number
}x

ephemient 2009-06-10 19:57:23

Now why do people say regexes are unreadable? That clearly says "Aardvarks are tasty" in Martian.

Michael Myers 2009-06-10 20:06:07

I forgot to write that the first option is much preferred over the second -- I'll say it here in this comment. However, the second option is pretty readable when it's nicely indented and commented. The condensed version is unreadable, but so are many one-liners in any language. /x is good style: in fact, I believe that Perl 6 rules (name chosen as "regex" is no longer truly accurate) are /x by default.

ephemient 2009-06-10 20:14:45

While I believe /x originated in Perl, Ruby has the same, Python has re.X=re.VERBOSE, Java has java.util.regex.Pattern.COMMENTS, .NET has System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace, and so on. I don't know why this feature isn't lauded as often as it should be.

ephemient 2009-06-10 21:51:08

Outstanding answer, +1 definitely. I'm using this in .Net, and all test cases (Lee's included) work except for one: 1-, 1 -, -1, - 1 all match. But those cases can easily be handled with the server-side code.

AJ 2009-06-11 01:31:47

For the record, .Net (powershell):[System.Text.RegularExpressions.Regex]::IsMatch("-1","^\s*(\d*)(?:\s*-\s*(\d*))?(?:\s*,\s*(\d*)(?:\s*-\s*(\d*))?)*\s*$")True

AJ 2009-06-11 01:33:59

My mistake. 1-, -1, etc. should be disallowed now.

ephemient 2009-06-11 01:48:35

Answer 5

+1 A:

Since I'm a tester, I was happy to see a list of test cases used as a specification. For completeness, I would add the following test cases:

2 - 9 : match
2- 9 : match
2 -9 : match
-1-9 : not a match

Also, even single page or page range is a little simplistic. I would consider supporting these additional test cases:

1,3 : match
1-5,13 : match
1-5,13-23 : match
1,13-23 : match
etc

Lee 2009-06-10 20:55:16

It took me so long to see all of these excellent responses because I was in a meeting about this very website, where many of your test cases were brought up as possible needs, especially the "#,#-#" scenario. At first I didn't think these would be necessary, but I'm not a tester :-) +1, and thanks.

AJ 2009-06-11 01:24:10

ansaurus

tags:

views:

answers:

Regular Expression to match a single page or page range?

related questions