views:

176

answers:

3

I work on an app that gets distributed via a single installer containing multiple localizations. The build process includes a script that updates the .ism string table with translations for each supported language.

This works fine for languages like French and German. But when testing the installer in, i.e. Japanese, the text shows up as a series of squares. It's unlikely to be a font problem, since the InstallShield-supplied strings show up fine; only the string table entries are mangled. So the problem seems to be that the strings are in the wrong encoding.

The .ism is in XML format, with UTF-8 declared as its encoding, so I assumed the strings needed to be UTF-8 encoded as well. Do they actually need to use the encoding of the target platform? Is there any concern, then, about targets having different encodings, i.e. Chinese systems using one GB-encoding versus another? What is the right thing to do here?

Edit: Using InstallShield 2009, since there is apparently a difference between that and 2010.

+2  A: 

In InstallShield 2009 and earlier, the encoding is a base-64 encoding of the binary string in the ANSI encoding specific to the language in question (e.g. CP932 for Japanese). In InstallShield 2010 and later, it will still accept that or use UTF-8, depending on other columns in that table.

Michael Urman
Doesn't seem to work; it just shows the base-64ed text.
DNS
Did you set the other columns? In particular I think the Encoded column gets a length of the un-base-64 encoded text.
Michael Urman
+1  A: 

I'm also trying to figure this out...

I've inhereted some Installshield 12 (which is pre-2009) projects with string table entries containing characters outside the range of base64 'target' characters.

For example, one of the Japanese strings is: 4P!H&$9!O'<4!R&\=!E&,=``@$(80!C&L=0!P"00!G`&4`;@!T`)(PI##S,+DPR##\,.LP5S!^,%DP`C

After much searching I happened upon Base85 encoding: http://Wikipedia.org/wiki/Ascii85 which looks much closer to being plausible, but have not yet verified this to be the solution...

Patrick Anderson
See http://community.flexerasoftware.com/showthread.php?p=374730 for some further information. It's not quite the standard base64 you're thinking of.
Michael Urman
Patrick: take a look at my answer for a working algorithm
DNS
Thanks for the pointers. Wow, this is quite a mess.I suddenly realized I must be going about this all wrong since I'm sure the person I inherited these projects from didn't resort to writing his own application to do this conversion...How does one *normally* enter non-latin text? Is there really no way to just copy/paste from a Word document? How do translators *usually* do this work?
Patrick Anderson
I think normally you use either the GUI or IS's VBScript automation interface to export the string table for each language. That produces a file encoded in either utf-8 or utf-16 (forget which). You give that to translators, they fill it in using a text editor and save with the same encoding. Then you import that file again. Our build system is a bit more involved, so that wasn't an option; we needed to work directly with the ism-file.
DNS
I exported the ISString table as the file 'ISString.idt', but it has the same strange encoding. Loading it into Orca changes nothing.I wonder what people typically use to edit this stuff.I guess I'll have to just have to buckle-down and write a command-line app to implement the algorithm you describe. I'll make it open-source and post a link back here when I'm done.
Patrick Anderson
A: 

Thanks (up-voted his answer) go to Michael Urman, for pointing us in the right direction. But this is the actual working (with InstallShield 2009) algorithm, reverse-engineered by a co-worker:

  1. Start with a unicode (multi-byte-character) string
  2. Write out the length as the encoded-length field in the ism-file
  3. Encode the string as UTF-16-little-endian
  4. Base-64 using the uuencode dictionary, except with ` (back-tick) instead of spaces.
  5. Write the result to the ism-file, escaping XML entities

Be aware that base-64ing using the uuencode dictionary is not the same as using the uuencode algorithm. Standard uuencode produces a set of newline-separated lines, including a header, footers and one or more data lines, each of which begins with a length-character. If you're implementing this using a uuencode codec, you'll need to strip all of that off.

DNS