ansaurus

Question

C# Regex Replace but Replace only a numbered Subgroup

Answer 1

+4 A:

You should use a look-ahead construct (match prefix but exclude it). This way, the first part (the "<TD " part) will not be matched and also not replaced:

"(?<=<\\w*)\\s*(X:\\w*)"

Philippe Leybaert 2009-07-10 06:58:20

fantastic, thats it. for ref, the final pattern is "(?<=<\\w*\\s*)(X:\\w*)"

Michael Dausmann 2009-07-10 07:04:08

Answer 2

A:

Here is the regex way to do it. Wondering why dont you do it using XSL or XML parsing (remove attribute") :-)

public static Regex regex1 = new Regex("^<\\w*\\s*td\\w*\\s*(X:\\w*)",
RegexOptions.IgnoreCase
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);


or "^<\\w*\\s*td\\w*\\s*(X:\\w*)"

Ratnesh Maurya 2009-07-10 07:01:02

I can't use XML parsing because the attribute is not well formed. I am trying to clean up the stoopid raw text so i CAN parse it as xml.

Michael Dausmann 2009-07-10 07:05:34

Answer 3

A:

Another way to acheive this is to use a replacement string to replace the whole match with only the first group ignoring the second group containing the crap.

string sResult = Regex.Replace(sInput, @"(<\w*\s*)(X:\w*\s*)", "$1")

This does not require any look-aheads and so should be quicker (a simple run showed it to be an order of magnitude quicker).

Changing the regex to have a + after the second group will remove all X: attributes, not only the first one (if this is relevant).

string sResult = Regex.Replace(sInput, @"(<\w*\s*)(X:\w*\s*)+", "$1")

Stevo3000 2009-07-10 07:55:06

ansaurus

tags:

views:

answers:

C# Regex Replace but Replace only a numbered Subgroup

related questions