tags:

views:

282

answers:

2

I need to find all instances of strings within an xml node. To be more specific, I'd like to parse some XAML and place all strings within certain controls (label for one) and set them as attributes instead. So, instead of this

<Label>My string</Label>

I want this:

<Label Content="My string"></Label>

The regular expression I have come up with is ">\s*[^<]". I read this as matching strings that have a greater than sign, followed by any amount of whitespace, followed by any character other than the less than sign. However, I'm not getting what I expect. For instance, here is one of the matches:

>\\r\\n\\t\\r\\n    <UserControl..."

Any ideas?

+2  A: 

it has been said here a thousand times: don't try to "parse" XML with regular expressions. the proper tools to use here is an xml processor.

with such, it is quite easy - and, more importantly, error-free - to select all <Label> elements and the text nodes (My string) inside and, from them, generate new XML nodes (<Label Content="My string"></Label>). the implementation is left as an exercise for the reader :)

ax
well said, ax +1
toolkit
The reason I'm doing this the manual way is because I need to preserve line breaks within xml and this can't be done with .NET's XmlDocument class.
bsh152s
Doesn't XmlDocument.PreserveWhitespace do this?
toolkit
XSLT is also quite capable of preserving whitespace, and REGEX is not technically capable of doing what you're asking
annakata
PreserveWhitespace does not preserve line breaks between attributes. In XAML, we sometimes have a list of 10+ namespaces (xmlns=...). It would be very annoying if those were all on the same line.
bsh152s
see http://stackoverflow.com/questions/1265255/putting-each-attribute-on-a-new-line-during-xml-serialization for how to put line breaks between attributes (via http://www.google.com/search?q=xml+line+breaks+between+attributes).
ax
+1  A: 

You could search for

<(Label|OtherTag|YetAnotherTag)>(\s*[^<]*)</\1>

and replace that with

<\1 Content="\2"></\1>

or even

<\1 Content="\2"/>

IF you're absolutely sure that there won't be any nested tags among those that you're looking at, and there really is no other way but regex.

Tim Pietzcker
It seems to get confused on the [^<]. If I replace the regular expression with ">\s*\w", I get more of what I want but I still don't think checking for alphanumeric (\w) is all encompassing. Any ideas on that?
bsh152s
What is "it"? This regex works in RegexBuddy, Perl, Python, Java, JavaScript, Ruby etc...
Tim Pietzcker
In C#, this string produces a match--">\r\n\t\r\n <UserControl..."
bsh152s