bom

PHP Include function outputting unknown char

When using the php include function the include is succesfully executed, but it is also outputting a char before the output of the include is outputted, the char is of hex value 3F and I have no idea where it is coming from, although it seems to happen with every include. At first I thbought it was file encoding, but this doesn't seem ...

BOM not expected in CF but sent by IIS/SP

I'm trying to consume a SharePoint webservice from Coldfusion via cfinvoke('cause I don't want to deal with(read parse) the SOAP response itself). The soap response includes a byte-order-mark character(BOM), which produces the following exception in CF: "Cannot perform web service invocation GetList. The fault returned when invoking the...

How do I set the byte order marker for Unicode files?

I know this is not a "real" programming question. But, it relates to programming so I am going to set it anyway. I have a program that I need to test that reads the Byte Order Marker of the file to see if it is utf-8 or utf-16. My problem is I cannot find a program/text editor that will allow me to set the byte order marker. Can anyb...

Remove Byte Order Mark from a File.ReadAllBytes (byte[])

I have an HTTPHandler that is reading in a set of CSS files and combining them and then GZipping them. However, some of the CSS files contain a Byte Order Mark (due to a bug in TFS 2005 auto merge) and in FireFox the BOM is being read as part of the actual content so it's screwing up my class names etc. How can I strip out the BOM char...

XML - Data At Root Level is Invalid

I have an XSD file that is encoded in UTF-8, and any text editor I run it through doesn't show any character at the beginning of the file, but when I pull it up in Visual Studio's debugger, I clearly see an empty box in front of the file. I also get the error: Data at the root level is invalid. Line 1, position 1. Anyone know w...

How Can I Best Guess the Encoding when the BOM (Byte Order Mark) is Missing?

My program has to read files that use various encodings. They may be ANSI, UTF-8 or UTF-16 (big or little endian). When the BOM (Byte Order Mark) is there, I have no problem. I know if the file is UTF-8 or UTF-16 BE or LE. I wanted to assume when there was no BOM that the file was ANSI. But I have found that the files I am dealing wit...

How to avoid tripping over UTF-8 BOM when reading files

I'm consuming a data feed that has recently added a Unicode BOM header (U+FEFF), and my rake task is now messed up by it. I can skip the first 3 bytes with file.gets[3..-1] but is there a more elegant way to read files in Ruby which can handle this correctly, whether a BOM is present or not? ...

Does Java have methods to get the various byte order marks?

I am looking for a utility method or constant in Java that will return me the bytes that correspond to the appropriate byte order mark for an encoding, but I can't seem to find one. Is there one? I really would like to do something like: byte[] bom = Charset.forName( CharEncoding.UTF8 ).getByteOrderMark(); Where CharEncoding comes fr...

How do I encode/decode UTF-16LE byte arrays with a BOM?

I need to encode/decode UTF-16 byte arrays to and from java.lang.String. The byte arrays are given to me with a Byte Order Marker (BOM), and I need to encoded byte arrays with a BOM. Also, because I'm dealing with a Microsoft client/server, I'd like to emit the encoding in little endian (along with the LE BOM) to avoid any misunderstand...

Write to utf-8 file in python

I'm really confused with the codecs.open function. When I do: file = codecs.open("temp", "w", "utf-8") file.write(codecs.BOM_UTF8) file.close() It gives me the error UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128) If I do: file = open("temp", "w") file.write(codecs.BOM_UTF8) file.cl...

Why would I use a Unicode Signature Byte-Order-Mark (BOM)?

Are these obsolete? They seem like the worst idea ever -- embed something in the contents of your file that no one can see, but impacts the file's functionality. I don't understand why I would want one. ...

Using awk to remove the Byte-order mark

Hi, has anyone an idea how an awk script (presumably a one-liner) for removing a BOM would look like? Specification: print every line after the first (NR > 1) for the first line: If it starts with #FE #FF or #FF #FE, remove those and print the rest ...

In .NET, how do I write a UTF-16 XMLDocument to a string with a BOM

I am building an XmlDocument on the fly in .NET with an xml document. I then transform that with the Transform() method of an XslCompiledTransform. The Transform() method threw an exception because an invalid character for the encoding was found in the stream. When I copy/paste the string with the help of the TextVisualizer in Visual St...

How can I identify different encodings without the use of a BOM?

I have a file watcher that is grabbing content from a growing file encoded with utf-16LE. The first bit of data written to it has the BOM available -- I was using this to identify the encoding against UTF-8 (which MOST of my files coming in are encoded in). I catch the BOM and re-encode to UTF-8 so my parser doesn't freak out. The proble...

 character (UTF-8 BOM) in middle of ASP.NET response due to HttpResponse.TransmitFile()

I've seen this post:  characters appended to the begining of each file. In that case, the author was manually reading the source file and writing the contents. In my case, I'm abstracting it away via HttpRequest.TransmitFile(): public void ProcessRequest(HttpContext context) { HttpRequest req = context.Request; HttpResponse ...

Prepend BOM to XML response from Django

I use Django's render_to_response to return an XML document. This particular XML document is intended for a flash-based charting library. The library requires that the XML document start with a BOM (byte order marker). How can I make Django prepent the BOM to the response? It works to insert the BOM into the template, but it's inconv...

Change encoding to UTF-8 recursively on Windows?

Does anybody know a tool, preferably for the Explorer context menu, to recursively change the encoding of files in a project from / to UTF-8 and other encodings? Freeware or not too expensive would be great. Edit: Thanks for the answers, +1 for all of them as they are all fine but I am a lazy bastard sometimes, and would really like to ...

ASP.NET: BOM in Server.Execute()

I'm using this to write to the Response stream: using (var writer = new StringWriter()) { context.Server.Execute(virtualpath, writer); string s = writer.ToString().Replace(...); context.Response.Write(s); } But I'm getting a byte order mark in the response. Am I screwing up the encoding? How do I NO...

How can i remove BOM from XmlTextWriter using C#

Hi, i need to remove the BOM from an XML file that is being created. I have tried using the new UTF8Encoding(false) method but it doesnt work. Here is the code i have: XmlDocument xmlDoc = new XmlDocument(); XmlTextWriter xmlWriter = new XmlTextWriter(filename, new UTF8Encoding(false)); xmlWriter.Formatting = Formatting....

Why can't I use the map function to create a good hash from a simple data file in Perl?

The post is updated. Please kindly jump to the Solution part, if you've already read the posted question. Thanks! Here's the minimized code to exhibit my problem: The input data file for test has been saved by Window's built-in Notepad as UTF-8 encoding. It has the following three lines: abacus æbәkәs abalone æbәlәuni abandon әbænd...