ansaurus

Question

How do I remove the BOM character from my xml file

Answer 1

+1 A:

Just strip first two bytes using any hex editor.

Dev er dev 2008-11-17 12:48:02

Or 3, depending on UTF flavor

MSalters 2008-11-17 12:48:52

Or 4, for UTF-32. But it's most likely 3, UTF-8 being the most common encoding for XML.

Alan Moore 2008-11-24 05:59:53

Answer 2

+1 A:

The BOM character is whitespace (non-breaking and zero-width) outside any tag. That should make it irrelevant. Can you clarify why it's a problem for you?

MSalters 2008-11-17 12:48:32

Answer 3

+1 A:

Unlike on plain text files, a byte order mark on a XML file should never cause any problems, since all XML parsers should be able to deal with it, even if it is the "UTF-8 BOM". In fact, it is even suggested on the XML standard itself as part of character encoding autodetection.

CesarB 2008-11-17 12:56:41

This is not a suggestion, section F is not normative. A UTF-8 BOM is explicitly allowed by the Unicode standard, but is not recommended - http://en.wikipedia.org/wiki/Byte_order_mark#cite_note-2 - the UTF-8 BOM does not indicate byte order.

mjustin 2009-12-15 13:55:36

Answer 4

A:

I was under the impression that XML is encouraged to be written in Unicode, in some Unicode encoding, and that certain Unicode encodings are specified to contain an initial byte-order mark. Without that byte-order mark, your file is no longer correctly encoded in a Unicode encoding and therefore no longer correct XML. XML processors are encouraged to be unforgiving, to fail immediately on the slightest error (such as an incorrect Unicode encoding). What kinds of XML processors are you looking to break?

Obviously, stripping a byte-order mark from a UTF-8 encoded document makes that document appear to be ASCII encoded (not Unicode), and some text processors are capable only of using ASCII encoded documents. Is this what you're working with?

Justice 2008-11-17 12:58:00

For XML files which do not specify the encoding and have no BOM, UTF-8 is the default encoding.

mjustin 2009-12-15 14:07:47

Answer 5

A:

Thanks guys for all your responses. I have just requested more information from the client as to why the xml file is breaking (its for a data feed). From our stand point the xml is valid but apparently they are using some xml validation tool we haven't access to. But when I get feedback from them, I'll let you guys know.

Thanks once again y'all.

2008-11-17 13:05:35

Answer 6

+1 A:

I've had this happen to me,and here's how I solved it (link to another SO question).

George Stocker 2008-11-17 13:19:40

Answer 7

A:

What output encoding is your XSL set to use? What encoding is the input document? Where's the input coming from, and where was it saved/uploaded/dowloaded in the meantime?

XML and XSL should default to using UTF-8 if nothing else is specified. But clearly, something's going wrong here.

One thing which might happen is, the XML is being served up by a web server which is set by default to serve in ISO-8859-1, a pretty good default ... pre-Unicode.

Slightly off-topic, but Joel's very instructive article about text encodings was an eye-opener to me. There are a lot of people out there who are otherwise very smart about programming, but who persist in thinking there's such a thing as "plain text" or calling their text "ASCII" or "ANSI". It's an issue you really need to get to grips with if you haven't yet.

AmbroseChapel 2008-11-17 23:19:27

Answer 8

+9 A:

# vim file.xml
:set nobomb
:wq

bene 2008-11-18 22:59:37

Answer 9

A:

The File BOM Detector (freeware for Windows) makes it easy to remove the byte order mark.

Anthony Faull 2010-07-08 10:52:59

ansaurus

tags:

views:

answers:

How do I remove the BOM character from my xml file

related questions