views:

61

answers:

2

I'm attempting to find complete XML objects in a string. They have been placed in the string by an XmlSerializer, but may or may not be complete. I've toyed with the idea of using a regular expression, because it seems like the kind of thing they were built for, except for the fact that I'm trying to parse XML.

I'm trying to find complete objects in the form:

<?xml version="1.0"?>
<type>
    <field>value</field>
    ...
</type>

My thought was a regex to find <?xml version="1.0"?><type> and </type>, but if a field has the same name as type, it obviously won't work.

There's plenty of documentation on XML parsers, but they seem to all need a complete, fully-formed document to parse. My XML objects can be in a string surrounded by pretty much anything else (including other complete objects).

hw<e>reR@lot$0fr@ndm&nchrs%<?xml version="1.0"?><type><field>...</field>...</type>@ndH#r$omOre!!>nuT6erjc?y!<?xml version="1.0"?><type><field>...</field>...</type>ty!=]

A regex would be able to match a string while excluding the random characters, but not find a complete XML object. I'd like some way to extract an object, parse it with a serializer, then repeat until the string contains no more valid objects.

A: 

You could try using the Html Agility Pack, which can be used to parse "malformed XML" and make it accessible with a DOM.

It would be necessary to know which element you are looking for (like <type> in your example), because it will be parsing the accidental elements too (like <e> in your example).

Jan Willem B
A: 

Can you use a regular expression to search for the "<?xml" piece and then assume that's the beginning of an XML object, then use an XMLReader to read/check the remainder of the string until you have parsed one entire element at the root level (then stop reading from the stream with XMLReader after the root node has been completely parsed)?

Edit: For more information about using XMLReader, I suggest one of the questions I asked: I can never predict xmlreader behavior, any tips on understanding?

My final solution was to stick with the "Read" method when parsing XML and avoid other methods that actually read from the stream advancing the current position.

BlueMonkMN
This might be the way to go... I'm not entirely sure how to use the XmlReader, though. I'm reading up on it, but do you have any helpful pointers?
Daniel Rasmussen
Yes, I have had some trouble figuring out how to properly use XMLReader, so I asked questions on it here, but I'm reasonably comfortable with it now. I edited the answer to add a link to that question.
BlueMonkMN