tags:

views:

342

answers:

1

I've seen several posts here on SO about loading XML documents from some data source where the data has Microsoft's proprietary UTF-8 preamble (for instance, this one).
However, I can't find an elegant (and working!) solution which does not involve striping out BOM characters manually.

For instance, there is this example:

byte[] b = System.IO.File.ReadAllBytes("c:\\temp_file_containing_bom.txt");
using (System.IO.MemoryStream oByteStream = new System.IO.MemoryStream(b)) {
    using (System.Xml.XmlTextReader oRD = new System.Xml.XmlTextReader(oByteStream)) {
        System.Xml.XmlDocument oDoc = new System.Xml.XmlDocument();
        oDoc.Load(oRD);
        Console.WriteLine(oDoc.OuterXml);
        Console.ReadLine();
    }
}

...but it still keeps throwing "invalid data" exception.

My problem is that I have a huge byte array which sometimes contains the BOM and sometimes it does not. I need to load it in XMLDocument. And I don't believe that I am the one who has to take care for the "helper" bytes.

A: 

That BOM is no longer 'proprietary'. It's written up in the XML specs. Only old version of Java (1.4) have a problem with it. It's pretty humorous if you've got MS technology exploding.

Use a buffered input stream to filter out the BOM by pushing back the first character if it's not the first character of the BOM sequence.

bmargulies