ansaurus

Question

How can I match occurrences of string not in another string using regular expressions?

Answer 1

+2 A:

You can try:

(?:[^@])string(?:[^@])

tinifni 2010-10-27 23:17:40

I believe this fails if there are no characters preceding or following "string"

Mark 2010-10-28 00:05:03

Ahh. Good point. I also realized on my commute home that it would not match @string and string@ as mentioned in the accepted solution. Onward towards better skills!

tinifni 2010-10-28 01:23:07

Answer 2

+4 A:

You can use negative lookahead and lookbehind:

(?<!@)string(?!@)

EDIT

NOTE: As per Marks comments below, this would not match @string or string@.

steinar 2010-10-27 23:21:09

fails to match `@string` or `string@` (an @ on one side, but not both). whether or not this is desirable, i'm not sure.

Mark 2010-10-28 00:06:26

@Mark Yes, but he explicitly states that it should be inside @@.

steinar 2010-10-28 00:16:36

@steinar: yes, that's what I'm saying. `@string` is *not* inside `@@` thus it *should* match.

Mark 2010-10-28 00:30:17

thus the correct regex would actually be `(?<!@)string|string(?!@)`, i.e., it *can* have an @ before, or an @ after, as long as it doesn't have both.

Mark 2010-10-28 00:32:55

Ahh, yes good point. Sorry.

steinar 2010-10-28 00:48:37

Answer 3

+1 A:

OK,

If you want to NOT match a character you put it in a character class (square brackets) and start it with the ^ character which negates it, for example [^a] means any character but a lowercase 'a'.

So if you want NOT at-sign, followed by string, followed by another NOT at-sign, you want

[^@]string[^@]

Now, the problem is that the character classes will each match a character, so in your example we'd get " string " which includes the leading and trailing whitespace. So, there's another construct that tells you not to match anything, and that is parens with a ?: in the beginning. (?: ). So you surround the ends with that.

(?:[^@])string(?:[^@])

OK, but now it doesn't match at the start of string (which, confusingly, is the ^ character doing double-duty outside a character class) or at the end of string $. So we have to use the OR character | to say "give me a non-at-sign OR start of string" and at the end "give me an non-at-sign OR end of string" like this:

(?:[^@]|^)string(?:[^@]|$)

EDIT: The negative backward and forward lookahead is a simpler (and clever) solution, but not available to all regular expression engines.

Now a follow-up question. If you had the word "astringent" would you still want to match the "string" inside? In other words, does "string" have to be a word by itself? (Despite my initial reaction, this can get pretty complicated :) )

Mark Thomas 2010-10-28 00:03:08

ansaurus

tags:

views:

answers:

How can I match occurrences of string not in another string using regular expressions?

related questions