views:

299

answers:

3

I need to match a fixed width field on a file layout with a regular expression. The field is numeric/integer, always have four characters and is included in the range of 0..1331. When the number is smaller than 1000, the string is filled with left zeros. So all these examples are valid:

  • 0000
  • 0001
  • 0010
  • 1000
  • 1331

But the following must be not accepted:

  • 1
  • 01
  • 10
  • 100
  • 4759

It would be nice if I could enforce this restriction only with regex. After playing a bit, I yielded the expression \0*[0-1331]\. The problem is that it does not restrict the size to four characters. Of course I could do \000[0-9]|00[10-99]|0[100-999]|[1000-1331]\ but I refuse to use something so nasty. Can anyone think of a better way?

+6  A: 

Regular expression are not the answer to every single problem. My advice would be to do something like:

boolean isValidSomethingOrOther (string):
    if string.length() != 4:
        return false
    for each character in string:
        if not character.isNumeric():
            return false
    if string.toInt() > 1331:
        return false
    return true

If you must use a regex, there's nothing wrong with your solution but I'd probably use the following variant (just based on my understanding of RE engines and how they work):

^0[0-9]{3}|1[0-2][0-9]{2}|13[0-2][0-9]|133[01]$
  • The first section matches 0000-0999.
  • The second matches 1000-1299.
  • The third matches 1300-1329.
  • The final one matches 1330 and 1331.

Update:

Just on the elegance comment, there are many forms of elegance of which regexes are one. You can also achieve elegance just by abstracting the validation out to a separate function or macro and then call it from your code:

if isValidSomethingOrOther(str) ...

where SomethingOrOther is a concrete business object. This allows you to change your idea of a valid object easily, even using a regex as you desire or any other checks you deem appropriate (such as my function above).

This allows you to cater for any changes down the line such as the requirement that these object now have to be prime numbers.

I'm sure I could write a "prime-number-less-than-1332" regex. I'm equally sure I wouldn't want to - I'd prefer to code that up as a function (or lookup table for raw speed), especially since the regex would most likely just look like:

^2|3|5|7| ... |1327$

anyway.

paxdiablo
I agree with you, but *in this particular case* they are much more elegant solution.
Lailson Bandeira
Fair enough then: if you need to use an RE, then use it. It doesn't really matter how ugly it looks since you're going to document it right there in the code, just before it's specified, right? Or call the match string something intelligent, like reMatch0000Thru1331? :-)
paxdiablo
As I said above, when I made this comment about elegance, I though I could write a simple regex to solve this problem, but now I'm going to match it with `\d{4}` and check the range when I make the capture with program code.
Lailson Bandeira
+1  A: 

This seems too easy, am I understanding the problem correctly?

\[01][0-9]{3}\

I don't know what .. means, integer in range? That must be a perlism or something.

This seems to work the way you want to me:

In [3]: r = re.compile(r'[01][0-9]{3}')

In [4]: r.match('0001')
Out[4]: <_sre.SRE_Match object at 0x2fa2d30>

In [5]: r.match('1001')
Out[5]: <_sre.SRE_Match object at 0x2fa2cc8>

In [6]: r.match('2001')

In [7]: r.match('001')

In [8]:
That also matches all the numbers from 1332 to 1999 inclusive, which is incorrect. The question states the number shouldn't be higher than 1331.
paxdiablo
Ops, you're right Pax. So I think I'll match with `/\d{4}/` and ensure the range with some code. Thanks for your observation.
Lailson Bandeira
A: 

boolean isValidSomethingOrOther (string): if string.length() != 4: return false for each character in string: if not character.isNumeric(): return false if string.toInt() > 1331: return false return true

will not work, as it will pass 12.2 or or any decimal less that 1331. eg: 34.5 or 55.6 etc

santosh