views:

1621

answers:

7

This is kinda a general question, open for opinions. I've been trying to come up with a good way to design for localization of string resources for a Windows MFC application and related utilities. My wishlist is:

  • Must preserve string literals in code (as opposed to replacing with macro #define resource ID's), so that the messages are still readable inline
  • Must allow localized string resources (duh)
  • Must not impose additional run-time environment restrictions (eg: dependency on .NET, etc.)
  • Should have minimal obtrusion into existing code (the less modification the better)
  • Should be debuggable
  • Should generate resource files which are editable by common tools (ie: common format)
  • Should not use copy/paste comment blocks to preserve literal strings in code, or anything else which creates the potential for de-synchronization
  • Would be nice to allow static (compile-time) checking that every "notated" string is in the resource file(s)
  • Would be nice to allow cross-language resource string pooling (for components in various languages, eg: native C++ and .NET)

I have a way which fulfills all my wishlist to some extent except for static checking, but I have had to develop a bit of custom code to achieve it (and it has limitations). I'm wondering if anyone has solved this problem in a particularly good way.

Edit: The solution I currently have looks like this:

ShowMessage( RESTRING( _T("Some string") ) );
ShowMessage( RESTRING( _T("Some string with variable %1"), sNonTranslatedStringVariable ) );

I then have a custom utility to parse out the strings from within the 'RESTRING' blocks and put them into a .resx file for localization, and a separate C# COM object to load them from localized resource files with fallback. If the C# object is not available (or cannot load), I fallback to the string in the code. The macro expands to a template class which calls the COM object and does the formatting, etc.

Anyway, I thought it would be useful to add what I have now for reference.

A: 

On one project I had localized into 10+ languages, I put everything that was to be localized into a single resource-only dll. At install time, the user selected which dll got installed with their application.

I only had to deliver the English dll to the localization team. They returned a localized dll to me for each language which I included in the build.

I know it's not perfect, but it worked.

BoltBait
Did you figure out a way to do this without replacing strings in code with resource ID's? What you're describing sounds like the colloquial method, which certainly can work, but isn't really what I'm looking for.
Nick
No, we don't hard-code strings into our apps. What we do see is a descriptive constant that relates to the resource id.
BoltBait
+1  A: 

I don't know much about how this is normally done on Windows, but the way localized strings are handled in Apple's Cocoa framework works pretty well. They have a very basic text-format file that you can send to a translator, and some preprocessor macros to retrieve the values from the files.

In your code, you'll see the strings in your native language, rather than as opaque IDs.

Mark Bessey
This is basically what I'm looking for, but in something which works with plain C++ (without a framework).
Nick
Yeah, I figured it might be like what you're looking for. Unfortunately, I don't know of any pure C++ versions of such a design, but it doesn't seem (to me) like it'd be that hard to write one. The key simplification there is have a "genstrings" tool to scan the source and create the files.
Mark Bessey
+2  A: 

We use the English string as the ID.

If it fails the look up from the international resource object (loaded from the I18N dll installed) then we default to the ID string.

Code looks like:

doAction(I18N.get("Press OK to continue"));

As part of the build processes we have a perl script that parses all source for string constants. It builds a temp file of all strings in the application and then compares these against the resource strings in each local to see if they exists. Any missing strings generates an e-mail to the appropriate translation team.

We can have multiple dll for each local. The name of the dll is based on RFC 3066
language[_territory][.codeset][@modifier]

We try and extract the locale from the machine and be as specific as possible when loading the I18N dll but fallback to less specific local variations if the more specific version is not present.

Example:

In the UK: If the local was en_GB.UTF-8
(I use the term dll loosely not in the specific windows sense).

First look for the I18N.en_GB.UTF-8 dll. If this dll does not exist fall back to I18N.en_GB. If this dll does not exist fall back to I18N.en If this dll does not exist fall beck to I18N.default

The only exception to this rule is: Simplified Chinese (zh_CN) where the fallback is US English (en_US). If the machine does not support simplified Chinese then it is unlikely to support full Chinese.

Martin York
Sounds similar to what I'm doing, but how to you extract all the strings to create the resource dll's? Or is it done by hand?
Nick
Ok, so you're doing something very similar to what I'm doing (except I chose resx and you use native resource libraries). Cool deal, makes me feel a lot better about the approach I'm taking. I'll leave it open for more comments, but this seems like a good approach.
Nick
GNU gettext is already doing all that, so just re-invented the (inferior version of) wheel.
Milan Babuškov
We have found that the English string is sometimes not specific enough to match an appropriate single string in all other languages. How would you handle when the english string should be 2 or more different strings depending on the part of the application it is used in?
Greg Domjan
@Greg Domjan: In English Version have more than 2 strings. "Press Button V1" and "Press Button V2". In the English resource file both map to the string "Press Button" while in other languages they map to specific variations. Then you need to add code to decide which string to use.
Martin York
I'm having trouble finding the exact examples, don't have easy access to the translated strings at the moment. I mean "Foo" in english used in screen 1 translates to "Bar" and on screen 2 "Baz" simply looking up by "Foo" isn't specific enough for the context. So would you then use "Foo1" and "Foo2" when the localisation points this out, and you also have to change the english for both of these to "Foo"?
Greg Domjan
A: 

Since it is open for opinions, here is how I do it.

My localized text file is a simple tab delimited text file that can be loaded in Excel and edited. The first column is for the define and each column to the right is a subsequent language, for example:

ID              ENGLISH      FRENCH    GERMAN
STRING_YES      YES          OUI       YA
STRING_NO       NO           NON       NEIN

Then in my makefile is a cusom build step that generates a strings.h file and a strings.dat. In my case it builds an enum list for the string ids and then a binary file with offsets for the text. Since in my app the user can change the language at any time i have them all in memory but you could easily have your pre-processer generate a different output file for each language if necessary.

The thing that I like about this design is that if any strings are missing then I would get a compile error whereas if strings were looked up at runtime then you might not know about a missing string in a seldom used part of the code until later.

KPexEA
A: 

You want an advanced utility that I've always wanted to write but never had the time to. If you don't find such a tool, you may want to fallback on my CMsg() and CFMsg() wrapper classes that allow to very easily pull strings from the resource table. (CFMsg even provide a FormatMessage one-liner wrapper. And yes, in the absence of that tool you're looking for, keeping a copy of the string in comment is a good solution. Regarding desynchronisation of the comment, remember that string literals are very rarely changed.

http://www.codeproject.com/KB/string/stringtable.aspx

BTW, native Win32 programs and .NET programs have a totally different resource storage management. You'll have a hard time finding a common solution for both.

Serge - appTranslator
A: 

Your solution is quite similar to the Unix/Linux "gettext" solution. In fact, you would not need to write the extraction routines.

I'm not sure why you want the _RESTRING macro to handle multiple arguments. My code (using wxWidgets' support for gettext) looks like this: MyString.Format(_("Some string with variable %ls"), _("variable"));. That is to say, String::Format(...) gets two individually translated arguments. In hindsight, Boost::Format would have been better, but it too would allow boost::format(_("Some string with variable %1")) % _("variable");

(We use the _() macro for brevity)

MSalters
I wanted the macro to handle inline varargs formatting, mainly for convenience. Otherwise I'd have to put something else around it, possibly with another string variable declaration, which is wasted code.
Nick
A: 

The simple way is to only use string IDs in your code - no literal strings. You can then produce different versions of the.rc file for each language and either create resource only DLLs or simply different language builds.

There are a couple of shareware utilstohelp localising the rc file which handle resizing dialog elements for languages with longer words and warnign about missing translations.

A more complicated problem is word order, if you have several numbers in a printf which must be in a different order for different language's grammar. There are some extended printf classes on codeproject that let you specify things like printf("word %1s and %2s",var1,var2) so you can switch %1s and %2s if necessary.

Martin Beckett