tags:

views:

60

answers:

5

Hi,

I'm trying to write a regex to validate part or model numbers.

These can contain letters, numbers, '-', '/' and spaces. They must contain at least 1 number and be between 4 and 20 characters long.

Here are some examples of the strings I want to match:

CVA 620 999
M3094
26250
APL8215/APL8225
1301
02-700401

This is what I have so far

([\w- /]*\d){3,19}

It seems to be working apart from it will match strings such as "This is my model APL8215", I only want it to match the "APL8215" part.

Is there anyway to match model numbers like this using regular expressions?

Any help very much appreciated!

A: 

Unfortunately because of the flexibility of your regex, this will of course match strings as you have specified above. If you could restrict your criteria further, for example to only include capitalised letters, then you would be able to pick up these codes with a regex such as:

[A-Z0-9- ]{4,20}

Tim
A: 

Since it seems not to be possible to group all model numbers under the same umbrella, I'd use more than one regexp:

  • CVA xxx xxx
  • Mxxxx
  • xxxxx
  • APLxxxx

where x is a digit (from your example) and so on. Once you have extracted the relevant subset of regexp, you can then refine your parsing or concat them into the same output.

lorenzog
+1  A: 

IMO it's better to make one regex per model number format and
then combine them in one big regex.

Example: r = (modelA_regex)|(modelB_regex)|(modelC_regex)

Nick D
Thanks Nick. Unfortunately I don't have a definitive list of what all the number formats could be, plus they are user entered (Doing some data mining) so don't tend to stick to any format anyway!
carok
+1  A: 

This is as close as I can get:

(?=.*\d)[\w\d\- ]{4,20}

Unfortunately it doesn't work with the example This is my model APL8215 because the rules are pragmatic enough to match This is my model APL as a valid part number before matching the APL8215 part.

cxfx
A: 

I think this one can respond to your problem :

\b((?=[A-Za-z/ -]{0,19}\d)[A-Za-z0-9/ -]{4,20})\b

It looks for a 4 to 20 chars string composed of [A-Za-z0-9/ -] chars and this string must be "on its own" (\b stands for word boundary). This string must contain at least one number : this is done with a lookahead expression : (?=[A-Za-z/ -]{0,19}\d).

With the following sample :

CVA 620 999
M3094
26250
APL8215/APL8225
1301
1232-1231
02-700401
DGEIVEOCE
cdzjkblcvsz#56464e
siovbsbf~1313/
APL8215/APL8225APL8215/APL8225

I get :

"CVA 620 999" 
"M3094" 
"26250" 
"APL8215/APL8225" 
"1301" 
"1232-1231" 
"02-700401" 
"56464e" 
"1313"
"APL8215/"
"APL8225APL8215/"
"APL8225"

The last results are explained by the word-boundary values expected : '/' can be word boundary. If you want to solve that problem, you must use a lookbehind before and a lookahead after the main Regex.

Is that what you wanted ?

Arno
Thanks Arno. It's still pulling out some of the surrounding words, but I don't think their is anyway around this!
carok