tags:

views:

450

answers:

4

if I have, for example, this list of keywords:

string keywords = "(shoes|shirt|pants)";

I need to find the whole words in a string. I thought this code would do that:

if (Regex.Match(content, keywords + "\\s+", RegexOptions.Singleline | RegexOptions.IgnoreCase).Success)

but for some reason it will return true for participants, even though I only want the whole word "pants".

+8  A: 

You should add the word delimiter to your regex:

\b(shoes|shirt|pants)\b

In code:

Regex.Match(content, @"\b(shoes|shirt|pants)\b");
Philippe Leybaert
A: 

put a word boundary on it using the \b metasequence.

David in Dakota
A: 

You need a zero-width assertion on either side that the characters before or after the word are not part of the word:

(?=(\W|^))(shoes|shirt|pants)(?!(\W|$))

As others suggested, I think \b will work instead of (?=(\W|^)) and (?!(\W|$)) even when the word is at the beginning or end of the input string, but I'm not sure.

richardtallent
A: 

Try

Regex.Match(content, @"\b" + keywords + @"\b", RegexOptions.Singleline | RegexOptions.IgnoreCase)

\b matches on word boundaries. See here for more details.

Ben Lings
\w matches a "word character". \W matches a non-word character, but it doesn't match the beginning or end of the line, so this would fail for the first and last word in a given input string.
richardtallent
Yep - realized I'd got it wrong as soon as I posted it...
Ben Lings