views:

397

answers:

2

I'm currently writing an Eclipse text editor plugin for a custom language.

The problem is, that the tool that is parsing these files does not understand unicode, but the editor should show unicode math symbols.

There is already a Netbeans plugin that handles this by translating unicode characters to multiple ANSI characters. E.g. U+27F6 (long rightwards arrow) is encoded to --> when writing to disk and vice versa when loading.

I'm looking for days now and I can't find an API or something that would allow me to do this cleanly on the Eclipse platform.

Does anyone know how to do this?

A: 

Does setting the charset thru IFile.setCharset() works?

Prakash G. R.
I looked at that. That would probably help if I knew how to define my own charset/encoding.
Axel Gneiting
+1  A: 

I am not sure what you mean by "encoded to -->".
Not the actual ASCII characters, I suppose, since there is no way to translate Unicode into ASCII representation for all Unicode combination.
For arrows alone, the work for defining ASCII-expressible tokens for arrows and arrow-like is... rather large!

I know about native2ascii which do the conversion (also as a plugin for Netbeans)

alt text

(not to be mixed with native2ascii.exe bundled with the JDK)

For Eclipse, you could use an ant task (which you can call from your Java program), and which would be the equivalent of:

<native2ascii encoding="EUCJIS" src="srcdir" dest="srcdir"
   includes="**/*.eucjis" ext=".java"/>

(which, here, converts all files in the directory srcdir ending in .eucjis from the EUCJIS encoding to ASCII and renames them to end in .java.)


You can also setup your own ascii <-> UTF conversion functions, as in this native2ascii Java project (not related to the native2ascii ant task or native2ascii.exe mentioned above)

extract:

       private static StringBuffer native2Ascii(char charater) {
                StringBuffer sb = new StringBuffer();
                if (charater > 255) {
                        sb.append("\\u");
                        int lowByte = (charater >>> 8);
                        sb.append(int2HexString(lowByte));
                        int highByte = (charater & 0xFF);
                        sb.append(int2HexString(highByte));
                } else {
                        sb.append(charater);
                }
                return sb;
        }

Note (unrelated): for PDE build, you need to set a special setting (javacDefaultEncoding). See this thread.

VonC
There is a predefined mapping table between unicode symbols and their multi byte ASCII representation. The arrow was just an example I also don't want to use the integrated Eclipse build system.native2ascii is only useful for Java source code as far as I know.But thank you very much for the effort anyway.
Axel Gneiting