ansaurus

Question

Regular expression to find instances of strings within XML nodes

Answer 1

+2 A:

it has been said here a thousand times: don't try to "parse" XML with regular expressions. the proper tools to use here is an xml processor.

with such, it is quite easy - and, more importantly, error-free - to select all <Label> elements and the text nodes (My string) inside and, from them, generate new XML nodes (<Label Content="My string"></Label>). the implementation is left as an exercise for the reader :)

ax 2009-10-15 19:48:11

well said, ax +1

toolkit 2009-10-15 19:50:14

The reason I'm doing this the manual way is because I need to preserve line breaks within xml and this can't be done with .NET's XmlDocument class.

bsh152s 2009-10-15 19:53:32

Doesn't XmlDocument.PreserveWhitespace do this?

toolkit 2009-10-15 20:00:02

XSLT is also quite capable of preserving whitespace, and REGEX is not technically capable of doing what you're asking

annakata 2009-10-15 20:02:15

PreserveWhitespace does not preserve line breaks between attributes. In XAML, we sometimes have a list of 10+ namespaces (xmlns=...). It would be very annoying if those were all on the same line.

bsh152s 2009-10-15 20:21:35

see http://stackoverflow.com/questions/1265255/putting-each-attribute-on-a-new-line-during-xml-serialization for how to put line breaks between attributes (via http://www.google.com/search?q=xml+line+breaks+between+attributes).

ax 2009-10-16 05:09:59

Answer 2

+1 A:

You could search for

<(Label|OtherTag|YetAnotherTag)>(\s*[^<]*)</\1>

and replace that with

<\1 Content="\2"></\1>

or even

<\1 Content="\2"/>

IF you're absolutely sure that there won't be any nested tags among those that you're looking at, and there really is no other way but regex.

Tim Pietzcker 2009-10-15 20:03:37

It seems to get confused on the [^<]. If I replace the regular expression with ">\s*\w", I get more of what I want but I still don't think checking for alphanumeric (\w) is all encompassing. Any ideas on that?

bsh152s 2009-10-15 21:14:17

What is "it"? This regex works in RegexBuddy, Perl, Python, Java, JavaScript, Ruby etc...

Tim Pietzcker 2009-10-16 06:03:06

In C#, this string produces a match--">\r\n\t\r\n <UserControl..."

bsh152s 2009-10-16 13:34:56

ansaurus

tags:

views:

answers:

Regular expression to find instances of strings within XML nodes

related questions