tags:

views:

233

answers:

1

Quite straight forward question. Is there a way to configure Eclipse to work with text files encoded with utf-8 with and without the BOM?

So far I've used eclipse with utf-8 encoding and it works, but when I try to edit a file generated by another editor that includes the BOM, Eclipse doesn't handle it properly, it 'shows an invisible character' at the begining of the file (the BOM). Is there a way to make Eclipse understand utf-8 encoded files with BOM?

+3  A: 

Both bug 78455 ("Provide an option to force writing a BOM to UTF-8 files") and bug 136854 don't leave much hope for such an option.

The support for encoding in the workspace is based on what is available from Java.
For any given resource in the workspace, it is possible to obtain a charset string that can be used with any Java APIs that take charset strings.
Examples are:

  • 'US-ASCII',
  • 'UTF-8',
  • 'Cp1252',
  • 'UTF-16' (Big Endian, BOM inserted automatically),
  • 'UTF-16BE' (Big Endian, BOM not inserted automatically),
  • 'UTF-16LE' (Little Endian, BOM not inserted automatically).

For Java encodings, except for the 'UTF-16' encoding, BOMs are not inserted (when writing) or discarded (when reading) for free.
Even if this is puzzling to end users, this is how all Java applications work.
If applications want to support creating UTF-8 files with BOMs to match their users' expectations, they need to provide such capability on their own (as neither Java nor the Resources model will help with that).
Eclipse does provide some improvements towards detecting BOMs, but not with generating or skipping them.

VonC