What you are trying to achieve is pretty tough with regular expressions, since there is no way to express “replace strings not matching a pattern”. You will have to use a “positive” pattern, telling what to match instead of what not to match.
Furthermore, you want to replace every character with a replacement character, so you have to make sure that your pattern matches exactly one character. Otherwise, you will replace whole strings with a single character, returning a shorter string.
For your toy example, you can use negative lookaheads and lookbehinds to achieve the task, but this may be more difficult for real-world examples with longer or more complex strings, since you will have to consider each character of your string separately, along with its context.
Here is the pattern for “not ‘abc’”:
[^abc]|a(?!bc)|(?<!a)b|b(?!c)|(?<!ab)c
It consists of five sub-patterns, connected with “or” (|
), each matching exactly one character:
[^abc]
matches every character except a
, b
or c
a(?!bc)
matches a
if it is not followed by bc
(?<!a)b
matches b
if it is not preceded with a
b(?!c)
matches b
if it is not followed by c
(?<!ab)c
matches c
if it is not preceded with ab
The idea is to match every character that is not in your target word abc
, plus every word character that, according to the context, is not part of your word. The context can be examined using negative lookaheads (?!...)
and lookbehinds (?<!...)
.
You can imagine that this technique will fail once you have a target word containing one character more than once, like example
. It is pretty hard to express “match e
if it is not followed by x
and not preceded by l
”.
Especially for dynamic patterns, it is by far easier to do a positive search and then replace every character that did not match in a second pass, as others have suggested here.