views:

314

answers:

4

Hi,

I am using ExtJS. One of the textfield made with ExtJS component should allow comma separated number/opeator strings (3 similar examples) like

1, 2-3, 4..5, <6, <=7, >8, >=9

2, 3..5, >=9,>10

<=9, 1, <=8, 4..5, 8-9

Here I am using equals, range (-), sequence (..) & greater than/equal to operators for numbers less than or equal to 100. These numbers are separated by a comma.

What can be a regular expression for following type of string

For my previously asked question.. I got a solution from "dlamblin": ^(?:\d+(?:(?:..|-)\d+)?|[<>]=?\d+)(?:,\s*\d+(?:(?:..|-)\d+)?|[<>]=?\d+)*$

This works perfect for all pattern except:

  1. Only if relationship operators (<, <=, >, >=) are present as first element of the string. E.g. <=3, 4-5, 6, 7..8 ---> works perfect, but <=3, 4-5, 6, 7..8, >=5 ---> relationship operator not at 1st element of string.

  2. Also string <3<4, 5, 9-4 doesnot give any error i.e. it is satisfying condition though comma is needed between <3 & <4

  3. Numbers in the string should be less than or equal to 100. i.e. <100, 0-100, 99..100

  4. It should not allow leading zeros (like 003, 099)

Pls help me in modifying this regular expression. Thanks in advance.

Atul

+6  A: 

Scrap that and use a tokenizer instead. Split up the string by commas, then look at each token and decide (possibly using a regular expression) which type of relationship it is. If it's none of the existing relationships, it's invalid. If any relationship contains a number that's too big, it's invalid.

For the sake of your sanity and the people who will have to maintain this code after you're done with it, don't use regular expressions to validate such a complicated interrelated set of rules. Break it down into simpler chunks.

Welbog
+2  A: 

Welbog's advice to use a tokenizer is the sane option.

If you have some other constraint that forces a regular expression, you can use

^(<|<=|>|>=)?\s*(100|0|[1-9]\d?)((\.\.|-)(100|0|[1-9]\d?))?(,\s*(<|<=|>|>=)?\s*(100|0|[1-9]\d?)((\.\.|-)(100|0|[1-9]\d?))?)*$

That's the result of expanding manually the following:

num   = (100|0|[1-9]\d?)
op    = (<|<=|>|>=)
range = op?\s*num((\.\.|-)num)?
expr  = ^range(,\s*range)*$
Greg Bacon
+1 for catching the `[1-9]\d?` section that Tim didn't manage to catch. The two answers together make a decent regex answer. Even though it's not that good an option.
Welbog
+1  A: 

This should work:

^(?:(?:\s*((?:\<|\>|\<\=|\>\=)?(?:[1-9]|[1-9]\d|100))\s*(?:,|$))|(?:\s*((?:[1-9]|[1-9]\d|100)(?:\.\.|\-)(?:[1-9]|[1-9]\d|100))\s*(?:,|$)))*$

(You'll need to use the "multiline" option, obviously.)

If you have the advantage of a regex engine that supports the "ignore whitespace" option, then you could break it up like this:

^                           # beginning of line
(?:   
  (?:
    \s*                     # any whitespace
    (                       # capture group
      (?:<|>|<=|>=)?        # inequality
      (?:[1-9]|[1-9]\d|100) # single value
    )
    \s*                     # any whitespace
    (?:,|$)                 # comma or end of line
  )
  |
  (?:
    \s*                     # any whitespace
    (                       # catpure group
      (?:[1-9]|[1-9]\d|100) # single value
      (?:\.\.|\-)           # range modifier
      (?:[1-9]|[1-9]\d|100) # single value
    )
    \s*                     # any whitespace
    (?:,|$)                 # comma or end of line
  )
)+                          # one or more of all this
$                           # end of line

As you can see, it matches your examples in Expresso:

http://imgur.com/5ctQS.png

Tim Sylvester
+1. If you must use a regular expression in this situation, please for the love of lasers document it like this.
Welbog
And who doesn't love lasers‽
Tim Sylvester
+1  A: 

I agree with Welbog that pre/post processing should be a better choice.

BUT since I like to so RegEx so here is my solution.

^[ \t]*(?:(?:0|[1-9][0-9]?|100)(?:(?:\-|\.\.)(?:0|[1-9][0-9]?|100))?|(?:[<>]=?)(?:0|[1-9][0-9]?|100))(?:[ \t]*,[ \t]*(?:(?:0|[1-9][0-9]?|100)(?:(?:\-|\.\.)(?:0|[1-9][0-9]?|100))?|(?:[<>]=?)(?:0|[1-9][0-9]?|100)))*[ \t]*$

'\s' is not used as it may include '\n' in some engine.

'\d' is not used as you will need [1-9] so [0-9] will be easier to use.

'(?:0|[1-9][0-9]?|100)' will match a number from 0 to 100 without leading zero.

'(?:[&lt;&gt;]=?)(?:0|[1-9][0-9]?|100)' will match conditions follows by a number (if you want to match '=' too, just adjust it).

'(?:0|[1-9][0-9]?|100)(?:(?:\-|\.\.)(?:0|[1-9][0-9]?|100))?' will match a number with optional range or sequence.

Full explanation:

^
[ \t]*  // Prefix spaces
(?: // A valid term
    // A number
    (?:0|[1-9][0-9]?|100)
    // Optional range or sequence
    (?:
        (?:\-|\.\.)
        (?:0|[1-9][0-9]?|100)
    )?
    |
    // Condition and number
    (?:[<>]=?)(?:0|[1-9][0-9]?|100)
)
(?: // Other terms
    [ \t]*,[ \t]*   // Comma with prefix and suffix spaces
    (?: // A valid term
        // A number
        (?:0|[1-9][0-9]?|100)
        // Optional range or sequence
        (?:
            (?:\-|\.\.)
            (?:0|[1-9][0-9]?|100)
        )?
        |
        // Condition and number
        (?:[<>]=?)(?:0|[1-9][0-9]?|100)
    )
)*
[ \t]*  // Tail spaces

I test with regex-search of Eclipse and it work.

Hope this helps.

NawaMan
Another decent one, but I'd use `\s*` instead of `[ \t]*` to catch other types of spaces (like some sneaky Unicode ones).
Welbog