tags:

views:

161

answers:

3

I have the following regular expression, that I am compiling with Pattern class.

\bIntegrated\s+Health\s+System\s+\(IHS\)\b

Why is this not matching this string?

"test pattern case Integrated Health System (IHS)."

If I try \bpattern\b, it seems to work, but for the above phrase it does not. I have the parenthesis in the pattern escaped, so not sure why it doesn't work. It does match if I remove the parenthesis portion of the pattern, but I want to match the whole thing.

A: 

You've got (IHS) - a group - where you want \(IHS\) as the literal brackets.

cyborg
A: 

You need to escape the parentheses

\bIntegrated\s+Health\s+System\s+\(IHS\)\b

Parentheses delimit a capture group. To match a literal set of parentheses, you can escape them like this \( \)

mopoke
+1  A: 

1) escape the parens, otherwise they are capturing and group metacharacters, not literal parenthesis \( \)

2) remove the final \b you can't use a word boundary after a literal ), since ) is not considered part of a word.

\bIntegrated\s+Health\s+System\s+\(IHS\)\W
Paul Creasey
Okay, how do I indicate the trailing boundary then, so it does not match something like \bIntegrated\s+Health\s+System\s+\(IHS\)testingI need to make sure it only matches the whole phrase and not some string that starts with this phrase.
Eqbal
you could use \W which is the same as [^\w] or [^a-bA-B0-9_] (not sure exactly what it includes in java), or you could create you own character class (or negated class) to specify what does or does not indicate a match. I've updated the example with \W which will likely work pretty well.
Paul Creasey
Thanks, \W seems to work pretty well so far combined with grouping to extract the matched phrase minus the non-word character that follows.
Eqbal
If you want to allow the match at the end of the string you would have to say `($|\W)`. I'm not sure it's so important though, are you likely to have strings like `Integrated Health Systems (IHS)foo`? The close bracket is almost invariably followed by space or punctuation.
bobince
Okay, here is my final regex pattern:`"(\\b|\\W)(" + phrase + ")($|\\W)"`Using the group 2 to get the matched phrase.
Eqbal
Hmm. That causes problem if the phrase begins with a "(". So modified it to `"(^|\\W)(" + phrase + ")($|\\W)"`
Eqbal