tags:

views:

290

answers:

6

I have a need to search all numbers with 4 digits between 2000 and 3000.

It can be that letters are before and after.

I thought I can use [2000-3000]{4}, but doesnt work, why?

thank you.

+3  A: 

Hum tricky one. The dash - only applies to the character immediately before and after so what your regex is actually matching is exactly 4 characters between 0 and 3 inclusive (ie, 0, 1, 2 and 3). eg, 3210, 1230, 3333, etc... Try the expression below.

(2[0-9]{3})|(3000)

ruibm
this works fine till now.
snarebold
read thread above.. don't work if I change.
snarebold
+21  A: 

How about

^2\d{3}|3000$

Or as Amarghosh & Bart K. & jleedev pointed out, to match multiple instances

\b(?:2[0-9]{3}|3000)\b

If you need to match a3000 or 3000a but not 13000, you would need lookahead and lookbefore like

(?<![0-9])(?:2[0-9]{3}|3000)(?![0-9])
S.Mark
+1 though I think start/end markers are not really required since we're going to capture all numbers.
queen3
Is `\d` guaranteed to match just `[0-9]` or does it depend on the locale?
jleedev
`12000` shouldn't match as `2000`, isn't it?
S.Mark
Add a word boundary `\b` instead of `^` and `$` @S.Mark Did you make it CW to stay on 10K for ever?
Amarghosh
That matches `13000` too (or `a3000`). It mathces `^2\d{3}` or `3000$`: you need to add some parenthesis.
Bart Kiers
@Amarghosh, for today at least. :D
S.Mark
@S.Mark Apparently you are safe for the day anyway - I tried to spoil your plans by up-voting on another post but you seem to have hit the rep-cap ;) You'd get a nice answer badge though.
Amarghosh
now I changed to (?:3[0-9]{3}|10000), to get all between 3000 and 10000, it won't find 4000.
snarebold
it would be `(?:[3-9][0-9]{3}|10000)`
S.Mark
@S.Mark, from the question "It can be that letters are before and after"
Ash
it works for my solution now, but I had to remove the ?: what does it mean "?:" ?, Thank you all.
snarebold
@snarebold `?:` makes a parenthetical group non capturing. Consider the string `"abbbcccd"` and regexes `a(b+)(c+)d` and `a(?:b+)(c+)d`. The matched results of first regex will contain `bbb` in first group and `ccc` in second group. Results of second regex will contain `ccc` in its first (and only) group as `(b+)` group was made non-capturing using `?:` See http://www.regular-expressions.info/brackets.html for more info.
Amarghosh
Please take a look here http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx and search for `(?:pattern)`
S.Mark
+2  A: 

Here's explanation why and ways to detect ranges: http://www.regular-expressions.info/numericranges.html

queen3
+2  A: 

Correct regex will be \b(2\d{3}|3000)\b. That means: match character '2' then exactly three digits (this will match any from 2000 to 2999) or just match '3000'. There are some good tutorials on regular expressions:

  1. http://gnosis.cx/publish/programming/regular_expressions.html
  2. http://immike.net/blog/2007/04/06/the-absolute-bare-minimum-every-programmer-should-know-about-regular-expressions/
  3. http://www.regular-expressions.info/
Rorick
A: 

why don't you check for greater or less than? its simpler than a regex

num >= 2000 and num <=3000 
ghostdog74
+8  A: 

Regular expressions are rarely suitable for checking ranges since for ranges like 27 through 9076 inclusive, they become incredibly ugly. It can be done but you're really better off just doing a regex to check for numerics, something like:

^[0-9]+$

which should work on just about every regex engine, and then check the range manually.

In toto:

def isBetween2kAnd3k(s):
    if not s.match ("^[0-9]+$"):
        return false
    i = s.toInt()
    if i < 2000 or i > 3000:
        return false
    return true

What your particular regex [2000-3000]{4} is checking for is exactly four occurrences of any of the following character: 2,0,0,0-3,0,0,0 - in other words, exactly four digits drawn from 0-3.

With letters before an after, you will need to modify the regex and check the correct substring, something like:

def isBetween2kAnd3kWithLetters(s):
    if not s.match ("^[A-Za-z]*[0-9]{4}[A-Za-z]*$"):
        return false
    idx = s.locate ("[0-9]")
    i = s.substring(idx,4).toInt()
    if i < 2000 or i > 3000:
        return false
    return true

As an aside, a regex for checking the range 27 through 9076 inclusive would be something like this hideous monstrosity:

^2[7-9]|[3-9][9-9]|[1-9][0-9]{2}|[1-8][0-9]{3}|90[0-6][0-9]|907[0-6]$

I think that's substantially less readable than using ^[1-9][0-9]+$ then checking if it's between 27 and 9076 with an if statement?

paxdiablo