tags:

views:

479

answers:

3

I'm running doxygen (1.5.8) on a C# project off of Visual Studio 2008 on a Windows machine running XP. While generating the latex code, some illegal sequences are included. It always involves the following sequence: "" (a latin-i with a dieresis, something like the binary shift operator, and a Spanish open-question-mark). I've seen it happen in the context "using {\bf System}", but maybe there are others.

The generated latex file reads

    \begin{CompactItemize}
    \item 
    using {\bf System}
    \end{CompactItemize}

While the source is simply:

using System;
using System.Collections.Generic;
using System.Linq;

Some strange Windows BOF character? It seems it's only before the using System; directive (the first of each file).

EDIT: Thanks to all. As it was stated in my accepted answer below, this is the BYTE_ORDER_MARK character. Obviously and unsurprisingly, Visual Studio is acting up. Good thing is that there is a way to save files in UTF-8 without the BOM signature. File -> Advanced Save Options -> Encoding (UTF-8 without signature) - Codepage 65001. Bad thing is that there seems to be no way to save all the existing files of a solution with this encoding in batch, so to speak, and each file has to be saved independently. Another quirk I found (at least in my case), is that File -> Advanced Save Options is not available until you double click on the class and the file is open in the editor. Oh well...

A: 

Do you have any idea what the text should look like?

There are a couple of possibilities, the most obvious one being that you've got some random Unicode there, and those are the characters you get from the TeX font.

Charlie Martin
A: 

It looks like a character encoding problem to me.

That three-character sequence is the Unicode byte-order mark 0xfeff encoded in UTF-8, although I'm not sure why the byte-order mark would be showing up in the middle of your documentation... that could be significant or it could just be a coincidence.

David Zaslavsky
+2  A: 



That's an ISO-8859-1 representation of the UTF-8 encoded character U+FEFF, the BYTE ORDER MARK. The BOM is intended for use as the first code point in UTF-16 files and should not be used in UTF-8 files, but there are some very stupid tools that produce it by default, unfortunately. And if you are creating files by concatenating bits of text from other files you can even end up with BOMs in the middle of your document.

Find the editor that is saving files as “UTF-8 with BOM” and burn it.

ETA re updated question:

using {\bf System}

While the source is simply:

using System;

Check that source in a hex editor for a hidden faux-BOM before the ‘using’.

bobince