ansaurus

Question

Answer 1

+2 A:

The lookahead should come first:

(\b(?!(the|as)\b)\w+\b)

I have also added word boundaries to ensure that it only matches whole words otherwise it would fail to match the complete word "as" but it would successfully match the letter "s" of that word.

You might also want to consider what \w matches and if that meets your needs. If you are looking for words in English you probably are interested in letters but not digits and you may wish to include some punctuation characters that are excluded by \w, such as apostrophes. You could try something like this instead (Rubular):

/(\b(?!(?:the|as)\b)[a-z'-]+\b)/i

To match words more accurately in a human language you could consider using a natural language parsing library instead of regular expressions.

Mark Byers 2010-09-04 19:39:49

Answer 2

+1 A:

You should use word boundaries to only match whole words. Either with a look-ahead assertion:

(\b(?!(?:the|as)\b)\w+\b)

Or with a look-behind assertion:

(\b\w+\b(?<!\b(?:the|as)))

Gumbo 2010-09-04 19:53:07

ansaurus

tags:

views:

answers:

Regex negation - word parsing

related questions