ansaurus

Question

Need help to modify this complex regular expression

Answer 1

+6 A:

Scrap that and use a tokenizer instead. Split up the string by commas, then look at each token and decide (possibly using a regular expression) which type of relationship it is. If it's none of the existing relationships, it's invalid. If any relationship contains a number that's too big, it's invalid.

For the sake of your sanity and the people who will have to maintain this code after you're done with it, don't use regular expressions to validate such a complicated interrelated set of rules. Break it down into simpler chunks.

Welbog 2009-11-16 18:25:32

Answer 2

+2 A:

Welbog's advice to use a tokenizer is the sane option.

If you have some other constraint that forces a regular expression, you can use

^(<|<=|>|>=)?\s*(100|0|[1-9]\d?)((\.\.|-)(100|0|[1-9]\d?))?(,\s*(<|<=|>|>=)?\s*(100|0|[1-9]\d?)((\.\.|-)(100|0|[1-9]\d?))?)*$

That's the result of expanding manually the following:

num   = (100|0|[1-9]\d?)
op    = (<|<=|>|>=)
range = op?\s*num((\.\.|-)num)?
expr  = ^range(,\s*range)*$

Greg Bacon 2009-11-16 18:44:05

+1 for catching the `[1-9]\d?` section that Tim didn't manage to catch. The two answers together make a decent regex answer. Even though it's not that good an option.

Welbog 2009-11-16 18:55:43

Answer 3

+1 A:

This should work:

^(?:(?:\s*((?:\<|\>|\<\=|\>\=)?(?:[1-9]|[1-9]\d|100))\s*(?:,|$))|(?:\s*((?:[1-9]|[1-9]\d|100)(?:\.\.|\-)(?:[1-9]|[1-9]\d|100))\s*(?:,|$)))*$

(You'll need to use the "multiline" option, obviously.)

If you have the advantage of a regex engine that supports the "ignore whitespace" option, then you could break it up like this:

^                           # beginning of line
(?:   
  (?:
    \s*                     # any whitespace
    (                       # capture group
      (?:<|>|<=|>=)?        # inequality
      (?:[1-9]|[1-9]\d|100) # single value
    )
    \s*                     # any whitespace
    (?:,|$)                 # comma or end of line
  )
  |
  (?:
    \s*                     # any whitespace
    (                       # catpure group
      (?:[1-9]|[1-9]\d|100) # single value
      (?:\.\.|\-)           # range modifier
      (?:[1-9]|[1-9]\d|100) # single value
    )
    \s*                     # any whitespace
    (?:,|$)                 # comma or end of line
  )
)+                          # one or more of all this
$                           # end of line

As you can see, it matches your examples in Expresso:

Tim Sylvester 2009-11-16 18:51:39

+1. If you must use a regular expression in this situation, please for the love of lasers document it like this.

Welbog 2009-11-16 18:54:27

And who doesn't love lasers‽

Tim Sylvester 2009-11-16 19:00:18

Answer 4

+1 A:

I agree with Welbog that pre/post processing should be a better choice.

BUT since I like to so RegEx so here is my solution.

^[ \t]*(?:(?:0|[1-9][0-9]?|100)(?:(?:\-|\.\.)(?:0|[1-9][0-9]?|100))?|(?:[<>]=?)(?:0|[1-9][0-9]?|100))(?:[ \t]*,[ \t]*(?:(?:0|[1-9][0-9]?|100)(?:(?:\-|\.\.)(?:0|[1-9][0-9]?|100))?|(?:[<>]=?)(?:0|[1-9][0-9]?|100)))*[ \t]*$

'\s' is not used as it may include '\n' in some engine.

'\d' is not used as you will need [1-9] so [0-9] will be easier to use.

'(?:0|[1-9][0-9]?|100)' will match a number from 0 to 100 without leading zero.

'(?:[<>]=?)(?:0|[1-9][0-9]?|100)' will match conditions follows by a number (if you want to match '=' too, just adjust it).

'(?:0|[1-9][0-9]?|100)(?:(?:\-|\.\.)(?:0|[1-9][0-9]?|100))?' will match a number with optional range or sequence.

Full explanation:

^
[ \t]*  // Prefix spaces
(?: // A valid term
    // A number
    (?:0|[1-9][0-9]?|100)
    // Optional range or sequence
    (?:
        (?:\-|\.\.)
        (?:0|[1-9][0-9]?|100)
    )?
    |
    // Condition and number
    (?:[<>]=?)(?:0|[1-9][0-9]?|100)
)
(?: // Other terms
    [ \t]*,[ \t]*   // Comma with prefix and suffix spaces
    (?: // A valid term
        // A number
        (?:0|[1-9][0-9]?|100)
        // Optional range or sequence
        (?:
            (?:\-|\.\.)
            (?:0|[1-9][0-9]?|100)
        )?
        |
        // Condition and number
        (?:[<>]=?)(?:0|[1-9][0-9]?|100)
    )
)*
[ \t]*  // Tail spaces

I test with regex-search of Eclipse and it work.

Hope this helps.

NawaMan 2009-11-16 18:54:50

Another decent one, but I'd use `\s*` instead of `[ \t]*` to catch other types of spaces (like some sneaky Unicode ones).

Welbog 2009-11-16 18:58:07

ansaurus

tags:

views:

answers:

Need help to modify this complex regular expression

related questions