views:

327

answers:

7

Hi everyone,

I have problem with regex. I need to make regex with an exception of a set of specified words, for example: apple, orange, juice. and given these words, it will match everything except those words above.

apple (should not match)
applejuice (match)
yummyjuice (match)
yummy-apple-juice (match)
orangeapplejuice (match)
orange (should not match)
juice (should not match)
orange-apple-juice (match)
apple-orange-aple (match)
juice-juice-juice (match)
orange-juice (match)

Thank you for any of your help! :)

A: 

Something like (PHP)

$input = "The orange apple gave juice";
if(preg_match("your regex for validating") && !preg_match("/apple|orange|juice/", $input))
{
  // it's ok;
}
else
{
  //throw validation error
}
Ben Fransen
Except that will match `applejuice` and therefore throw a validation error.
gnarf
+1  A: 

If you really want to do this with single regular expression, you can find lookaround helpfur (especially negative lookahead in this example). Regex written for Ruby (some implementations have different syntax for lookarounds):

rx = /^(?!apple$|orange$|juice$)/
MBO
A: 

Sounds like you want to treat the hyphen as a word character.

fenway
+1  A: 

I noticed that apple-juice should match according to your parameters, but what about apple juice? I'm assuming that if you are validating apple juice you still want it to fail.

So - lets build a set of characters that count as a "boundary":

/[^-a-z0-9A-Z_]/        // Will match any character that is <NOT> - _ or 
                        // between a-z 0-9 A-Z 

/(?:^|[^-a-z0-9A-Z_])/  // Matches the beginning of the string, or one of those 
                        // non-word characters.

/(?:[^-a-z0-9A-Z_]|$)/  // Matches a non-word or the end of string

/(?:^|[^-a-z0-9A-Z_])(apple|orange|juice)(?:[^-a-z0-9A-Z_]|$)/ 
   // This should >match< apple/orange/juice ONLY when not preceded/followed by another
   // 'non-word' character just negate the result of the test to obtain your desired
   // result.

In most regexp flavors \b counts as a "word boundary" but the standard list of "word characters" doesn't include - so you need to create a custom one. It could match with /\b(apple|orange|juice)\b/ if you weren't trying to catch - as well...

If you are only testing 'single word' tests you can go with a much simpler:

/^(apple|orange|juice)$/ // and take the negation of this...
gnarf
A: 

This gets some of the way there:

((?:apple|orange|juice)\S)|(\S(?:apple|orange|juice))|(\S(?:apple|orange|juice)\S)
Antony Carthy
A: 
\A(?!apple\Z|juice\Z|orange\Z).*\Z

will match an entire string unless it only consists of one of the forbidden words.

Alternatively, if you're not using Ruby or you're sure that your strings contain no line breaks or you have set the option that ^ and $ do not match on beginnings/ends of lines

^(?!apple$|juice$|orange$).*$

will also work.

Tim Pietzcker
A: 

Hi, it's me again.. Agustinus.. I was using anonymous.. I just want to say that Tim's and gnarf's solution works.. ^^ Thank you!