tags:

views:

46

answers:

3

Say for example I have the following string "one two(three) (three) four five" and I want to replace "(three)" with "(four)" but not within words. How would I do it?

Basically I want to do a regex replace and end up with the following string:

"one two(three) (four) four five"

I have tried the following regex but it doesn't work:

@"\b\(three\)\b"

Basically I am writing some search and replace code and am giving the user the usual options to match case, match whole word etc. In this instance the user has chosen to match whole words but I don't know what the text being searched for will be.

A: 

Why regex? Simply replace "(three)" with "(four)". Are there any other cases than you mentioned which would make use of regex necessary?

Gopi
I don't want "two(three)" to be replaced with "(four)" just (three) on it's own!
CroweMan
Ohh thats not prominently seen from your question. Sorry I misinterpreted your question then.
Gopi
No problem, I have updated the question to give more details
CroweMan
A: 

As Gopi said, but (theoretically) catching only (three) not two(three):

string input = "one two(three) (three) four five";

string output = input.Replace(" (three) ", " (four) ");

When I test that, I get: "one two(three) (four) four five" Just remember that white-space is a string character, too, so it can also be replaced. If I did this:

//use same input
string output = input.Replace(" ", ";");

I'd get one;two(three);(three);four;five"

AllenG
The problem is that the user is entering the text in a find and replace box and they have selected 'match whole words'. So I need to use something inteligent like regular expressions and I can't just add a " " before or after the expression as the character proceding could be a ',' or something else
CroweMan
+3  A: 

Your problem stems from a misunderstanding of what \b actually means. Admittedly, it is not obvious.

The reason \b\(three\)\b doesn’t match the threes in your input string is the following:

  • \b means: the boundary between a word character and a non-word character.
  • Letters (e.g. a-z) are considered word characters.
  • Punctuation marks such as ( are considered non-word characters.

Here is your input string again, stretched out a bit, and I’ve marked the places where \b matches:

 o n e   t w o ( t h r e e )   ( t h r e e )   f o u r   f i v e
↑     ↑ ↑     ↑ ↑         ↑     ↑         ↑   ↑       ↑ ↑       ↑

As you can see here, there is a \b between “two” and “(three)”, but not before the second “(three)”.

The moral of the story? “Whole-word search” doesn’t really make much sense if what you’re searching for is not just a word (a string of letters). Since you have punctuation characters (parentheses) in your search string, it is not as such a “word”. If you searched for a word consisting only of word characters, then \b would do what you expect.

You can, of course, use a different Regex to match the string only if it surrounded by spaces or occurs at the beginning or end of the string:

(^|\s)\(three\)(\s|$)

However, the problem with this is, of course, that if you search for “three” (without the parentheses), it won’t find the one in “(three)” because it doesn’t have spaces around it, even though it is actually a whole word.

I think most text editors (including Visual Studio) will use \b only if your search string actually starts and/or ends with a word character:

var pattern = Regex.Escape(searchString);
if (Regex.IsMatch(searchString, @"^\w"))
    pattern = @"\b" + pattern;
if (Regex.IsMatch(searchString, @"\w$"))
    pattern = pattern + @"\b";

That way they will find “(three)” even if you select “whole words only”.

Timwi
It possibly doesn't make sense but that is how I would like it to work. Have you got any ideas how I could do this? Basically I would like to mimick the find and replace functionality within visual studio.
CroweMan
@CroweMan: You are contradicting yourself. You said, “I don't want "two(three)" to be replaced”, but Visual Studio does.
Timwi
Thank you very much. You are a star!
CroweMan