I have a Java application that uses a C++ DLL via JNI. A few of the DLL's methods take string arguments, and some of them return objects that contain strings as well.
Currently the DLL does not support Unicode, so the string handling is rather easy:
- Java calls String.getBytes() and passes the resulting array to the DLL, which simply treats the data as a char*.
- DLL uses NewStringUTF() to create a jstring from a const char*.
I'm now in the process of modifying the DLL to support Unicode, switching to using the TCHAR type (which when UNICODE is defined uses windows' WCHAR datatype). Modifying the DLL is going well, but I'm not sure how to modify the JNI portion of the code.
The only thing I can think of right now is this:
- Java calls String.getBytes(String charsetName) and passes the resulting array to the DLL, which treats the data as a wchar_t*.
- DLL no longer creates Strings, but instead passes jbyteArrays with the raw string data. Java uses the String(byte[] bytes, String charsetName) constructor to actually create the String.
The only problem with this method is that I'm not sure what charset name to use. WCHARs are 2-bytes long, so I'm pretty sure it's UTF-16, but there are 3 posibilities on the java side. UTF-16, UTF-16BE, and UTF-16LE. I haven't found any documentation that tells me what the byte order is, but I can probably figure it out from some quick testing.
Is there a better way? If possible I'd like to continue constructing the jstring objects within the DLL, as that way I won't have to modify any of the usages of those methods. However, the NewString JNI method doesn't take a charset identifier.