views:

241

answers:

2

I have an XmlReader that is trying to read text into a list of elements. I am having trouble getting it to reader the text: "a [ z ]". If I try with the text "a [ z ] " (same but with two trailing spaces) it works fine. Below is an example:

TextReader tr = new StringReader("a [ z ]");
XmlReaderSettings settings = new XmlReaderSettings
{
    ConformanceLevel = ConformanceLevel.Fragment,
    ProhibitDtd = false,
    ValidationType = ValidationType.None,
    XmlResolver = null,
    CheckCharacters = false,
    IgnoreProcessingInstructions = true,
};
XmlReader reader = XmlReader.Create(tr, settings);
reader.Read();

StringBuilder sb = new StringBuilder();

while (!reader.EOF)
{
    if (reader.NodeType == XmlNodeType.Text || reader.NodeType == XmlNodeType.Whitespace)
    {
        sb.Append(reader.Value);
        reader.Read();
    }   
}

// sb.ToString() should be "a [ z ]"

When you run it fails with the message: "System.Xml.XmlException : Unexpected end of file has occurred. Line 1, position 7." and a stack trace:

at System.Xml.XmlTextReaderImpl.Throw(Exception e) 
at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars)
at System.Xml.XmlTextReaderImpl.FinishPartialValue()
at System.Xml.XmlTextReaderImpl.get_Value()
at LocalisationFormats.Tests.Shared.InlineElements.InlineElementHelperTest.Test()

When you attempt to debug it, the Reader is in a ReadState of "Error" and the Reader.Value is "a [ z", and then you break the reader and get an OutOfMemoryExecption.

Anyone any suggestions?

EDIT: removed extra if block from code snippet on suggestion from Gregoire.

+2  A: 

I believe the problem is that when you are loading a non-Xml formatted string into an XmlReader object.

"XmlReader provides forward-only, read-only access to a stream of XML data. The XmlReader class conforms to the W3C Extensible Markup Language (XML) 1.0 and the Namespaces in XML recommendations." & "XmlReader throws an XmlException on XML parse errors." - MSDN XmlReader Class Article http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx

Try loading and reading actual Xml data instead by changing:

TextReader tr = new StringReader("a [ z ]");

to:

TextReader tr = new StringReader("<node>a [ z ]</node>");

or alternately, if you need each piece in its own node:

TextReader tr = new StringReader("<node>a</node><node> </node><node>[</node><node> </node><node>z</node><node> </node><node>]</node>");

I'm providing complete source for the latter example, because I THINK that's what you're aiming at here.

TextReader tr = new StringReader("<node>a</node><node> </node><node>[</node><node> </node><node>z</node><node> </node><node>]</node>");
XmlReaderSettings settings = new XmlReaderSettings
{
    ConformanceLevel = ConformanceLevel.Fragment,
    ProhibitDtd = false,
    ValidationType = ValidationType.None,
    XmlResolver = null,
    CheckCharacters = false,
    IgnoreProcessingInstructions = true,
};
XmlReader reader = XmlReader.Create(tr, settings);
reader.Read();

StringBuilder sb = new StringBuilder();

while (!reader.EOF)
{
    string s = reader.ReadElementString();

    if (s != " ")
    {
        sb.Append(s);
    }
}

This will allow you to iterate through the nodes, getting the full string values with no exceptions.

~md5sum~

md5sum
I thought that setting the XmlReader.ConformanceLevel to Fragment would mean it could parse any well formated XML (see http://msdn.microsoft.com/en-us/library/h2344bs2.aspx). I thought my text was well formated XML (just with out a root node).
gbanfill
Well formatted Xml has to be at LEAST a node, but does not need to follow the single root element rule.
md5sum
A: 

I've checked and this has been fixed in .Net 4, but still broken in .Net 3.5 as of this post.

Adrian