tags:

views:

45

answers:

3

I'm trying to match all occurances of "string" in something like the following sequence except those inside @@

as87dio u8u u7o @string@ ou os8 string os u

i.e. the second occurrence should be matched but not the first

Can anyone give me a solution?

+2  A: 

You can try:

(?:[^@])string(?:[^@])
tinifni
I believe this fails if there are no characters preceding or following "string"
Mark
Ahh. Good point. I also realized on my commute home that it would not match @string and string@ as mentioned in the accepted solution. Onward towards better skills!
tinifni
+4  A: 

You can use negative lookahead and lookbehind:

(?<!@)string(?!@)

EDIT

NOTE: As per Marks comments below, this would not match @string or string@.

steinar
fails to match `@string` or `string@` (an @ on one side, but not both). whether or not this is desirable, i'm not sure.
Mark
@Mark Yes, but he explicitly states that it should be inside @@.
steinar
@steinar: yes, that's what I'm saying. `@string` is *not* inside `@@` thus it *should* match.
Mark
thus the correct regex would actually be `(?<!@)string|string(?!@)`, i.e., it *can* have an @ before, or an @ after, as long as it doesn't have both.
Mark
Ahh, yes good point. Sorry.
steinar
+1  A: 

OK,

If you want to NOT match a character you put it in a character class (square brackets) and start it with the ^ character which negates it, for example [^a] means any character but a lowercase 'a'.

So if you want NOT at-sign, followed by string, followed by another NOT at-sign, you want

[^@]string[^@]

Now, the problem is that the character classes will each match a character, so in your example we'd get " string " which includes the leading and trailing whitespace. So, there's another construct that tells you not to match anything, and that is parens with a ?: in the beginning. (?: ). So you surround the ends with that.

(?:[^@])string(?:[^@])

OK, but now it doesn't match at the start of string (which, confusingly, is the ^ character doing double-duty outside a character class) or at the end of string $. So we have to use the OR character | to say "give me a non-at-sign OR start of string" and at the end "give me an non-at-sign OR end of string" like this:

(?:[^@]|^)string(?:[^@]|$)

EDIT: The negative backward and forward lookahead is a simpler (and clever) solution, but not available to all regular expression engines.

Now a follow-up question. If you had the word "astringent" would you still want to match the "string" inside? In other words, does "string" have to be a word by itself? (Despite my initial reaction, this can get pretty complicated :) )

Mark Thomas