tags:

views:

278

answers:

2

I use the following regular expression to validate a comma separated list of values.

^Dog|Cat|Bird|Mouse(, (Dog|Cat|Bird|Mouse))*$

The values are also listed in a drop down list in Excel cell validation, so the user can select a single value from the drop down list, or type in multiple values separated by commas.

The regular expression does a good job of preventing the user from entering anything but the approved values, but it doesn't prevent the user from entering duplicates. For example, the user can enter "Dog" and "Dog, Cat", but the user can also enter "Dog, Dog".

Is there any way to prevent duplicates using a similar single regular expression? In other words I need to be able to enforce a discrete list of approved comma separated values.

Thanks!

+3  A: 

Use a backreference and a negative lookahead:

^(Dog|Cat|Bird|Mouse)(, (?!\1)(Dog|Cat|Bird|Mouse))*$

EDIT: This won't work with cases such as "Cat, Dog, Dog" ... You'll need to come up a hybrid solution for such instances - I don't believe there is a single regex that can handle that.


Here's another technique. You need to check two things, first, that it DOES match this:

(?:(?:^|, )(Dog|Cat|Bird|Mouse))+$

(That's just a slightly shorter version of your original regex)

Then, check that it DOES NOT match this:

(Dog|Cat|Bird|Mouse).+?\1

E.g.

var valid = string.match( /(?:(?:^|, )(Dog|Cat|Bird|Mouse))+$/ ) &&
           !string.match( /(Dog|Cat|Bird|Mouse).+?\1/ );
J-P
Yup, close, but doesn't catch "Cat, Dog, Dog". Thanks!
Kuyenda
I don't know if this would work but maybe you can do the back reference (?!(\1|\2|\5|\8) and change the * to 4.I think this might blow up if the capture groups don't exists though. I didn't try it, just a thought.
Tim
Why "\1\2\5\8"? Wouldn't it be "\1\2\3\4"? Thanks!
Kuyenda
There are 3 capture groups in the second set. Although actually there are two. I don't think this approach above will work, but if you are generating the regular expressions you could try just listing out all the options ^(Dog|Cat|Bird|Mouse)(,(?!\1)(Dog|Cat|Bird|Mouse))(,(?!(*(\1|\2)*)Dog|Cat|Bird|Mouse)And so on and so forth, although it seems like what you really want to do is validate it programatically.
Tim
J-P, did you intentionally leave out ^ and $ in your DOES NOT regex? I am testing the two part solution right now.
Kuyenda
I implemented the two part solution in my code. I used to database only the true part, now I have the option of storing a regular expression for a true part, false part, or both. Not only does it allow me to check for duplicates in a list like `Dog|Cat|Bird|Mouse`, it also makes my data validation much more flexible and robust in general. Thanks.
Kuyenda
The spaces in the above regex(es) should ideally be optional.
Peter Boughton
A: 

J-P, I tried editing your sample regular expressions so that I could look for duplicates in any comma separated string. Something like this:

var valid = string.match( /(?:(?:^|, )([a-z]*))+$/ ) &&
    !string.match( /([a-z]*).+?\1/ );

Unfortunately, I failed. The Force is weak with me. ;)

Thanks again for your help.

Kuyenda