tags:

views:

93

answers:

3
+1  Q: 

Regex keywords

Hello,

I'm trying to use Regex in C# to look for a list of keywords in a bunch of text. However I want to be very specific about what the "surrounding" text can be for something to count as a keyword.

So for example, the keyword "hello" should be found in (hello), hello., hello< but not in hellothere.

My main problem is that I don't REQUIRE the separators, if the keyword is the first word or the last word it's okay. I guess another way to look at it is that the beginning-of-the-file and the end-of-the-file should be acceptable separators.

I'm new to Regex so I was hoping someone could help me get the pattern right. So far I have:

[ <(.]+?keyword[<(.]+?

where <, (, . are some example separators and keyword is of course the keyword I'm looking for.

Thanks in advance

A: 

I think you want something like:

(^$|[ <(.])+?keyword($|[<(.]+?)

The ^ and $ chars symbolise the start and end of the input text, respectively. (If you specify the Multiline option, it matches to the start/end of the line rather than text, but you would seem to want the Singleline option.)

Noldorin
Singleline lets the '.' (dot) metacharacter match line-separator characters (\r and \n); it would have no effect on this regex, since the only dots are in character classes, where they would just match dots anyway.
Alan Moore
@Alan: My point was only that Singleline/Multiline option changes the meaning of `^` and `$`, not `.` The question states that the OP specifically wants to detect `.` as a seperator.
Noldorin
It sounds like you're thinking of Singleline and Multiline as the opposing states of a single toggle setting. The names seems to imply as much, but in fact they're completely independent: Singleline changes the meaning of '.' and Multiline changes the meaning of '^' and '$'. "Singleline" always was an unfortunate name; some flavors call it DOTALL mode, which is much more descriptive.
Alan Moore
A: 

You will want to look into the word boundary (\b) to avoid matching keywords that appear as a part of another word (as in your hellothere example).

You can also add matching at beginning of line (^) and end of line ($) to control the position where keywords may appear.

Fredrik Mörk
+2  A: 

You could use the word boundary anchor:

\bkeyword\b

which would find your keyword only when not part of a larger word.

Joey