tags:

views:

249

answers:

7

I need to validate serial numbers. For this we use regular expressions in C#, and a certain product, part of the serial number is the "seconds since midnight". There are 86400 seconds in a day, but how can I validate it as a 5-digit number in this string?:

654984051-86400-231324

I can't use this concept:

[0-8][0-6][0-4][0-0][0-0]

Because then 86399 wouldn't be valid. How can I overcome this? I want something like:

[00000-86400]

UPDATE
I want to make it clear that I'm aware of - and agree with - the "don't use regular expressions when there's a simpler way" school-of-thought. Jason's answer is exactly how I'd like to do it, however this serial number validation is for all serial numbers that pass through our system - there's currently no custom validation code for these specific ones. In this case I have a good reason for looking for a regex solution.

Of course, if there isn't one, then that makes the case for custom validation for these particular products undeniable, but I wanted to explore this avenue fully before going with a solution that requires code changes.

+9  A: 

Don't use regex? If you're struggling to come up with the regex to parse this that says that maybe it's too complex and you should find something simpler. I see absolutely no benefit to using regex here when a simple

int value;
if(!Int32.TryParse(s, out value)) {
    throw new ArgumentException();
}
if(value < 0 || value > 86400) {
    throw new ArgumentOutOfRangeException();
}

will work just fine. It's just so clear and easily maintainable.

Jason
Regex is a great, powerful tool but I think people reach for it way too often and too quickly anytime a parsing/validation problem comes up.
Jason
Whoa, hold your horses - this serial number validation is for all serial numbers that pass through our system - there's no custom validation code for these specific ones. I know to avoid regex if possible, but there are good reasons for it *in this case*.
Neil Barnwell
That sounds like a great reason to add hooks to your system.
Ken
+5  A: 

You don't want to try to use regular expressions for this, you'll end up with something incomprehensible, unwieldy, and difficult to modify (somebody will probably suggest one :). What you want to do is match the string using a regex to make sure that it contains digits in the format you want, then pull out a matching group and check the range using an arithmetic comparison. For example, in pseudocode:

match regex /(\d+)-(\d+)-(\d+)/
serial = capture group 2
if serial >= 0 and serial <= 86400 then
    // serial is valid
end if
Greg Hewgill
A: 

I don't believe this is possible in regular expressions since this isn't something that can be checked as part of a regular language. In other words, a finite state automata machine cannot recognize this string so a regular expression cannot either.

Edit: This can be recognized by a regex, but not in an elegant way. It would require a monster or chain (e.g.: 00000|00001|00002 or 0{1,5}|0{1,4}1|0{1,4}2). To me, having to enumerate such a large set of possibilities makes it clear that while it is technically possible, it is not feasible or manageable.

Justin Johnson
is this true? I don't really know much about FSA, but the hypothetical counterexample is "00000|00001|... .... |86400"
Jimmy
It certainly can, because the string representations of every whole number between 0 and 86400 is a finite set. All finite sets can be accepted by a finite-state automaton.
Welbog
You're both definitely right about my omission. I've edited my answer
Justin Johnson
It is feasible in an "elegant" manner by generating the regex programatically... But that would be taking the problem statement upside down... Anyway, as per consensus, RegEx is definitely not appropriate got this type of use cases.
Romain
Generating a 518399 character string is not exactly what I would call elegant and it seems to me that it would be very inefficient to parse and compare against.
Justin Johnson
See Jimmy's answer - it's a regex that matches the problem definition, and is only 41 characters long, not 518399...
GalacticCowboy
True, but that's also not generating the regex programmatically, which is what the 518399 was in reference to.
Justin Johnson
A: 

If you really need a pure regex solution I believe this would work although the other posters make a good point about only validating they are digits and then using a matching group to validate the actual number.

([0-7][0-9]{4}) | (8[0-5][0-9]{3}) | (86[0-3][0-9]{2}) | (86400)
Taylor Leese
Not going to work as that fails to validate 79800.
Broam
You're right. I fixed it.
Taylor Leese
What about 83400?
GalacticCowboy
Third times a charm hopefully.
Taylor Leese
Yep, that looks right.
GalacticCowboy
+6  A: 

With the standard 'this-is-not-a-particularly-regexy-problem' caveat,

[0-7]\d{4}|8[0-5]\d{3}|86[0-3]\d{2}|86400
Jimmy
Robert Harvey's version also handles numbers under 10000 that aren't 0-padded.
Jimmy
+4  A: 

Generate a Regular Expression to Match an Arbitrary Numeric Range http://utilitymill.com/utility/Regex_For_Range

yields the following regex expression:

\b0*([0-9]{1,4}|[1-7][0-9]{4}|8[0-5][0-9]{3}|86[0-3][0-9]{2}|86400)\b

Description of output:

First, break into equal length ranges:
  0 - 9
  10 - 99
  100 - 999
  1000 - 9999
  10000 - 86400

Second, break into ranges that yield simple regexes:
  0 - 9
  10 - 99
  100 - 999
  1000 - 9999
  10000 - 79999
  80000 - 85999
  86000 - 86399
  86400 - 86400

Turn each range into a regex:
  [0-9]
  [1-9][0-9]
  [1-9][0-9]{2}
  [1-9][0-9]{3}
  [1-7][0-9]{4}
  8[0-5][0-9]{3}
  86[0-3][0-9]{2}
  86400

Collapse adjacent powers of 10:
  [0-9]{1,4}
  [1-7][0-9]{4}
  8[0-5][0-9]{3}
  86[0-3][0-9]{2}
  86400

Combining the regexes above yields:
  0*([0-9]{1,4}|[1-7][0-9]{4}|8[0-5][0-9]{3}|86[0-3][0-9]{2}|86400)

Tested here: http://osteele.com/tools/rework/

Robert Harvey
A: 

I would use regex combined with some .NET code to accomplish this. A pure regex solution isn't going to be easy or efficient to handle large number ranges.

But this will:

Regex myRegex = new Regex(@"\d{9}-(\d{5})-\d{6}");
String value = myRegex.Replace(@"654984051-86400-231324", "$1");

This will grab the value 86400 in this case. And then you'd just check if the captured number is between 0 and 86400 as per Jason's answer.

Steve Wortham