I have the following XML Parsing code in my application:
public static XElement Parse(string xml, string xsdFilename)
{
var readerSettings = new XmlReaderSettings
{
ValidationType = ValidationType.Schema,
Schemas = new XmlSchemaSet()
};
readerSettings.Schemas.Add(null, xsdFilename);
readerSettings.ValidationFlags |= XmlSchemaValidationFlags.ProcessInlineSchema;
readerSettings.ValidationFlags |= XmlSchemaValidationFlags.ProcessSchemaLocation;
readerSettings.ValidationFlags |= XmlSchemaValidationFlags.ReportValidationWarnings;
readerSettings.ValidationEventHandler +=
(o, e) => { throw new Exception("The provided XML does not validate against the request's schema."); };
var readerContext = new XmlParserContext(null, null, null, XmlSpace.Default, Encoding.UTF8);
return XElement.Load(XmlReader.Create(new StringReader(xml), readerSettings, readerContext));
}
I am using it to parse strings sent to my WCF service into XML documents, for custom deserialization.
It works fine when I read in files and send them over the wire (the request); I've verified that the BOM is not sent across. In my request handler I'm serializing a response object and sending it back as a string. The serialization process adds a UTF-8 BOM to the front of the string, which causes the same code to break when parsing the response.
System.Xml.XmlException : Data at the root level is invalid. Line 1, position 1.
In the research I've done over the last hour or so, it appears that XmlReader should honor the BOM. If I manually remove the BOM from the front of the string, the response xml parses fine.
Am I missing something obvious, or at least something insidious?
EDIT: Here is the serialization code I'm using to return the response:
private static string SerializeResponse(Response response)
{
var output = new MemoryStream();
var writer = XmlWriter.Create(output);
new XmlSerializer(typeof(Response)).Serialize(writer, response);
var bytes = output.ToArray();
var responseXml = Encoding.UTF8.GetString(bytes);
return responseXml;
}
If it's just a matter of the xml incorrectly containing the BOM, then I'll switch to
var responseXml = new UTF8Encoding(false).GetString(bytes);
but it was not clear at all from my research that the BOM was illegal in the actual XML string; see e.g. http://stackoverflow.com/questions/581318/c-detect-xml-encoding-from-byte-array