ansaurus

Question

How can I avoid a specific string pattern from being replaced by Regex.replace ()

Answer 1

+2 A:

If you're trying to do something in the context of HTML syntax, use an HTML parser.

Amber 2010-05-06 10:15:04

Answer 2

A:

Get each line of text into a string A

Remove the bit between <a></a> and store it in string B

Run your Regex on the remaining text in string A

return A + B

Peter McGrattan 2010-05-06 10:21:17

Location of <a></a> tags will be lost.

Taz 2010-05-06 10:23:22

No it won't, you need to show a simple code sample with some clear sample data.

Peter McGrattan 2010-05-06 10:47:20

You are right in this string it wont. But `<a></a>` does not necessarily appear at the end and there are more than one `<a></a>` blocks.

Taz 2010-05-06 10:56:21

Yes: so show us some code and some useful test data so we can have a chance of helping you better in all scenarios!

Peter McGrattan 2010-05-06 11:07:19

I have edited the question. I think it is more clear now.

Taz 2010-05-06 11:56:43

Answer 3

+1 A:

Here's how you can do the opposite of what you're asking (replace only the instances inside the tags):

content = Regex.Replace(content, @"(?<=\<\s*a[^>]+)\bPakistan\b(?=.*?\>)", "India");

This is very untested and not what you want, but it could give you some hints. This uses zero-width lookaround assertions. I'm sure there are many other ways to do it.

This is really pushing the limits of regex. You should probably use an HTML parser.

Edit: using negative lookbehind, this appears to work (please test it!):

content = Regex.Replace(content, @"(?<!\<\s*a[^>]+)\bPakistan\b", "India");

Chris Schmich 2010-05-06 10:31:20

Does the C# regex allow variable-width expressions in negative lookbehinds? Most regex engines that support lookbehinds don't allow variable-width expressions (due to not knowing how far back to step to attempt to match them).

Amber 2010-05-06 10:43:35

My potentially flawed understanding of "zero-width" was that it meant the assertion captured nothing. The .NET regex example at http://msdn.microsoft.com/en-us/library/bs2twtah.aspx#sectionToggle8 appears to use variable-width expressions: "(?<!(Saturday|Sunday) )\b\w+ \d{1,2}, \d{4}\b" (the Saturday/Sunday alternation).

Chris Schmich 2010-05-06 11:03:22

@Dav: .NET is nearly unique among regex flavors in that you can use any expression you like inside a lookbehind. @Chris: it's more correct to say that a zero-width assertion (like a lookbehind) *consumes* nothing. Capturing is something else.

Alan Moore 2010-05-06 12:37:48

I used it like thisinputText = Regex.Replace(inputText, @"(?<=\<\s*a[^<]+)\bStringToReplace\b(?=.*?\>)", "DBPT"); inputText = System.Text.RegularExpressions.Regex.Replace(inputText, "(\\bStringtoReplace\\b)", Replacement); inputText = Regex.Replace(inputText, @"(?<=\<\s*a[^<]+)\bDBPT\b(?=.*?\>)", StringtoReplace);

Taz 2010-05-06 12:59:35

Answer 4

+1 A:

Although @Chris solution does not works exactly here, but you can use in this way.

string content = "Pakistan is <a href=\" Pakistan is\">Pakistan an islamic country</a>";
string content2= Regex.Replace(content,@"\bPakistan\b", "India");
string content3 = Regex.Replace(content2, @"(?<=\<\s*a[^<]+)\bIndia\b(?=.*?\>)", "pakistan");        
Console.WriteLine(content3);

but this is not a very efficient solution.

Adeel 2010-05-06 11:34:21

May be not very efficient but easy to understand and implement. Thanks

Taz 2010-05-06 12:18:12

I used it like this inputText = Regex.Replace(inputText, @"(?<=\<\s*a[^<]+)\bStringToReplace\b(?=.*?\>)", "DBPT"); inputText = System.Text.RegularExpressions.Regex.Replace(inputText, "(\\bStringtoReplace\\b)", Replacement); inputText = Regex.Replace(inputText, @"(?<=\<\s*a[^<]+)\bDBPT\b(?=.*?\>)", StringtoReplace)

Taz 2010-05-06 12:59:51

Answer 5

+1 A:

For the first part of your question, I would match either a link or the target word:

Regex r = new Regex(@"<a\s+.*?</a>|\bPakistan\b");

Then I would use a MatchEvaluator to check which one I matched and replace accordingly: if it's a link, plug it back in; if it's the target word, linkify it.

For the second part, you can Join the strings in the array into a regex alternation, like this:

string regex = String.Format(@"\b({0})\b", String.Join("|", links));

Just remember that an alternation returns the first matching alternative, not the longest. If any alternative A is a prefix of alternative B, B should be listed before A. For example, the Middle East should come before the Middle in your list.

Alan Moore 2010-05-06 13:16:19

ansaurus

tags:

views:

answers:

How can I avoid a specific string pattern from being replaced by Regex.replace ()

related questions