tags:

views:

88

answers:

6

What would be the following regular expressions for the following strings?

56AAA71064D6

56AAA7105A25

Would the regular expression change if the numbers rolled over? What I mean by this is that the above numbers happen to contain hexadecimal values and I don't know how the value changes one it reaches F. Using the first one as an example: 56AAA71064D6, if this went up to 56AAA71064F6 and then the following one would become 56AAA7106406, this would create a different regular expression because where a letter was allowed, now their is a digit, so does this make the regular expression even more difficult. Suggestions?

A manufacturer is going to enter a range of serial numbers. The problems are that different manufacturers have different formats for serial numbers (some are just numbers, some are alpha numeric, some contain extra characters like dashes, some contain hexadacimal values which makes it more difficult because I don't know how the roll over to the next serial number). The roll over issue is the biggest problem because the serial numbers are entered as a range like 5A1B - 6F12 and without knowing how the roll over, it seems to me that storing them in the database is not as easy. I was going to have the option of giving the user the option to input the pattern (expression) and storing that in the databse, but if a character or characters changes from a digit to a letter or vice versa, then the regular expression is no longer valid for certain serial numbers.

Also, the above example I gave is with just one case. There are multitude of serial numbers that would contain different expressions.

+3  A: 

There's no single regular expression which is "the" expression to match both of those strings. Instead, there are infinitely many which will do so. Here are two options at opposite ends of the spectrum:

(56AAA71064D6)|(56AAA7105A25)

.*

The first will only match those two strings. The second will match anything. Both satisfy all the criteria you've given.

Now, if you specify more criteria, then we'd be able to give a more reasonable idea of the regular expression to provide - and that will drive the answers to the other questions. (At the moment, the only answer that makes sense is "It depends on what regex you use.")

Jon Skeet
On the contrary: there are infinite many regular expressions that will match both of those strings. Examples? .* a?.* b?.* c?.* Got the picture?
Ingo
@lngo, you just totally missed the point of his post.
Mike
I just see that you expressed the same somewhat differently. Sorry for that.
Ingo
@Ingo: He never asserted that there were only a finite number of regular expressions that will match both of those strings. Read more carefully. (When he says "no single" he means, there isn't just one)
Platinum Azure
By biggest issue doesn't seem to be the regular expression per say, it is more of if I don't know how the next serial number is rolled into another one, that may or may not invalidate the regular expression. Does this make sense?
Xaisoft
@Xaisoft: No - because until you know which regular expression you're going to use, you can't possibly determine whether or not it will be invalidated by the next serial number.
Jon Skeet
@jon, does this mean I might have to store more than 1 regular expression and check the serial number against both to see if it validates?
Xaisoft
@Xaisoft: Not necessarily - it just means that your original question, as asked, makes little sense. You need to know which serial number formats you need to support, and then you can work on getting a useful regular expression.
Jon Skeet
@Jon, yes I usually don't make any sense, lol. The problem is that I have serial number formats could vary greatly, that is why I thought about the user entering the regular expression, but not all users know regular expressions and might possibly enter an incorrect one, so I guess an option might be to enter it myself, what do you think?
Xaisoft
+1 for Ingo for swallowing their pride
El Ronnoco
@Xaisoft: You really should enter it yourself. You're right not to trust users to enter regexes-- on top of lack of technical knowledge for some (most) of them, you also don't want to trust user input more than you have to.
Platinum Azure
@Jon why isn't [A-F0-9]{12} what he's describing even in the first version of the question?
Keng
@Keng: Where in the first version does it say that there will be 12 characters, or that they'll be hex digits? Why would `[A-F0-9]{12}` be any more correct than `.*`? All we were given is two strings which *should* match - but no indication of what *shouldn't* match.
Jon Skeet
@Jon In the original post he said, "the above numbers happen to contain hexadecimal values and I don't know how the value changes one it reaches F." "...where a letter was allowed, now their is a digit...." That's what made me think he was talking about hex and that he only needed a-f and 0-9.
Keng
@Keng: It was the "happen to" bit that made me think it wasn't guaranteed :)
Jon Skeet
@Jon indeed 80)
Keng
+1  A: 

I think you could do it this way for 12 characters. This will search for a 12 character phrase where each of the characters must be a capital (A or B or C or D or E or F or 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 0)

[A-F0-9]{12}

If you're wanting to include the possibility of dashes then do this.

[A-F0-9\-]{12}

Or you're wanting to include the possibility of dashes plus the 12 characters then do this. But that would pick up any 12-15 character item that fit the criteria though.

[A-F0-9\-]{12,15}

Or if it's surrounded by spaces (AAAAHHHh...SO is stripping out my spaces!!!)

[A-F0-9\-]{12}

Or if it's surrounded by tabs

\t[A-F0-9\-]{12}\t

Keng
Would the dashes be optional?
Xaisoft
Also, would the dashes be considered part of the length of 12. If not then the length would have to be 15 if I included 3 dashes for example, correct?
Xaisoft
@Xaisoft yes the dashes would be optional. yes the dashes would be considered part of the 12 character limit.
Keng
@Keng, thanks. I just verified with an online regular expression validator
Xaisoft
@Keng, I know you can specify a length range like {12 - 16}, but is there a way to say that I want at least 12 characters but the expression will still validate if it is greater than 12 characters.
Xaisoft
@Xaisoft sure is {12,} will do that.
Keng
@Keng, this worked. Thanks again.
Xaisoft
@Xaisoft: If you want 12 characters, but yet want to have dashes in the middle that don't count, you need something more sophisticated than a regex. Basically I would just iterate through the characters in the string, and increment a counter whenever you find 0-9 or A-F, ignore `-` characters, and error out for anything else. It's not too hard to write a 10-line loop that does this.
Platinum Azure
@Xaisoft no problem.
Keng
+2  A: 

This match a string that contains 12 hexa

[0-9A-F]{12}
M42
A: 

well it sounds like you're describing a 12 digit hexadecimal number:

^[A-F0-9]{12}$

duncan
@duncan - I never understood this, but what is the purpose of ^ and $, I know it symbols the beginning and end of the string, but don't all inputs that are validated have a beginning and an end. My guess is that it means start the expression at the very beginning of the string, but if is it not there, I can specify at what index I want the expression to start. Do I have this understanding correct? If I do, how would I specify that I want to start evaluating the input to the expression at the 3rd character and stopping at the 9th.
Xaisoft
@Xaisoft: If you don't supply ^ and $, then the pattern can occur anywhere in a longer string. You're correct in saying that all strings have a beginning and an end, but those characters assert that the pattern should be "anchored" to the beginning and/or end of the string. For example, the regex `^blah` matches all strings that START with "blah", and who cares what else they have. Similarly, `blah$` matches strings ENDING with "blah". And finally, the regex `^blah$` means just match a string with nothing but "blah" in it.
Platinum Azure
To follow up on my last comment: Regex `blah` (no ^ or $) would match strings like "blah", "blah12031305135", "adongaonblah", and "16blah120513" (note that it just means "blah" occurs anywhere in the string). Since your use case is very specific (match a 12-digit hex number exactly rather than just find it within a longer string), the beginning- and end-of-string anchors ^ and $ fit your use case well.
Platinum Azure
@Platinum, thanks for the clarification.
Xaisoft
+1  A: 

Assuming these are all 12-digit hexadecimal numbers, which it looks like they are, the following regex should work:

[0-9A-Fa-f]{12}

Here I'm using a character class to say that I want any digit, OR A-F, OR a-f. As a bonus I'm allowing lowercase letters; if you don't want those just get them out of the regex.

As Jon Skeet and others have said, you really didn't provide enough information, so if you don't like this answer please understand that I was doing the best I can with what information you provided.

Platinum Azure
Thanks the help so far, but the example I gave is just one format of a regular expression. If I have different formats, then I was thinking I need to store them in the database? With your expression above, does it matter if one of the characters turns from a letter into a number or vice versa?
Xaisoft
This regular expression will hold for all 12-digit hexadecimal numbers, including handling "rollover" cases. Read up on character classes here: http://www.regular-expressions.info/charclass.html
Platinum Azure
+1  A: 

So, why not [0-9A-F]{12}?

Ingo