tags:

views:

1011

answers:

6

I have a program that allows a user to type java code into a rich text box and then compile it using the java compiler. Whenever I try to compile the code that I have written I get an error that says that I have an illegal character at the beginning of my code that is not there. This is the error the compiler is giving me:

C:\Users\Travis Michael>"\Program Files\Java\jdk1.6.0_17\bin\javac" Test.java
Test.java:1: illegal character: \187
public class Test
 ^
Test.java:1: illegal character: \191
public class Test
  ^
2 errors
+1  A: 

http://en.wikipedia.org/wiki/Byte_order_mark

The byte order mark (BOM) is a Unicode character used to signal the endianness (byte order) of a text file or stream. Its code point is U+FEFF. BOM use is optional, and, if used, should appear at the start of the text stream. Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in.

The BOM is a funky-looking character that you sometimes find at the start of unicode streams, giving a clue what the encoding is. It's usually handles invisibly by the string-handling stuff in Java, so you must have confused it somehow, but without seeing your code, it's hard to see where.

You might be able to fix it trivially by manually stripping the BOM from the string before feeding it to javac. It probably qualifies as whitespace, so try calling trim() on the input String, and feeding the output of that to javac.

skaffman
Let me try. One sec
muckdog12
i tried to trim it and it did not work. btw i am using VB.NET
muckdog12
vb.net.... now that would've been useful information to have at the start....
skaffman
Well.... then can you help me?
muckdog12
Regarding "giving a clue what the encoding is" I just want to point out: Although the BOM *can* give a hint as to the encoding it is *not* intended to be used for this purpose. As the name suggests it tells you only the byte order. In fact in UTF-16 and UTF-32 (little endian) there is an ambiguity that means that the BOM cannot be used to tell them apart reliably. The BOM is not a replacement for correctly handling character encoding issues.
Mark Byers
Would saving the file with a different encoding?
muckdog12
@Mark: Good point, well made - I oversimplified in haste.@muckdog: Sorry, can't help you there, vb.net isn't my thing.
skaffman
That's alright. Thanks for the help
muckdog12
How might i be able to get rid of the BOM?
muckdog12
muckdog12: it's just a character like any other character. You can remove it using any of the string operations that you would normally use to remove characters.
Mark Byers
I can't seem to find the encoding.default
muckdog12
Ok. Thank you for all your help.
muckdog12
A: 
  1. If using an IDE, specify the java file encoding (via the properties panel)
  2. If NOT using an IDE, use an advanced text-editor (I can recommend Notepad++) and set the encoding to "UTF without BOM", or "ANSI", if that suits you.
Bozho
A: 

Be absolutely certain that you are writing the file in the same character encoding as you are telling javac to use.

Thorbjørn Ravn Andersen
I am writing the file with the rich text box's default SaveFile() method. I am saving the data to a .java file. When I went to change the encoding, I couldn't find any encoding options
muckdog12
+2  A: 

The BOM is generated by, say, File.WriteAllText() or StreamWriter when you don't specify an Encoding. The default is to use the UTF8 encoding and generate a BOM. You can tell the java compiler about this with its -encoding command line option.

The path of least resistance is to avoid generating the BOM. Do so by specifying System.Text.Encoding.Default, that will write the file with the characters in the default code page of your operating system and doesn't write a BOM. Use the File.WriteAllText(String, String, Encoding) overload or the StreamWriter(String, Boolean, Encoding) constructor.

Just make sure that the file you create doesn't get compiled by a machine in another corner of the world. It will produce mojibake.

Hans Passant
Thank you so much. I finally worked! Maybe one day Microsoft will get rid of BOM and all the other bugs that they have!
muckdog12
Be careful throwing that bug bomb. That a relatively new chunk of software like the Java compiler cannot auto detect UTF8 is pretty stunning. This appears to be a problem in Vietnam too: http://vietunicode.sourceforge.net/howto/java/encoding.html
Hans Passant
A: 

That's a byte order mark, as everyone says.

javac does not understand the BOM, not even when you try something like

javac -encoding UTF8 Test.java

You need to strip the BOM or convert your source file to another encoding. Notepad++ can convert a single files encoding, I'm not aware of a batch utility on the Windows platform for this.

The java compiler will assume the file is in your platform default encoding, so if you use this, you don't have to specify the encoding.

zneo
A: 

Hey guy I used to have the same problem as you and I was also using notepad. The only thing you need to do in order to get rid of the problem is to save your file as ANSI format. The javac will understand it ASA you do it !. Good luck and let me know if I can help you doing something else ...

ARCR

Abed Calderon