views:

1197

answers:

4

In the current standard of C++ (C++03), there are too few specifications about text localization and that makes the c++ developer's life harder than usual when working with localized texts. (Certainly the C++0x standard will help here later.)

Assuming the following scenario (which is from real PC-Mac game development cases):

  1. responsive (real time) application : the application has to minimize non-responsive times to "not noticeable", so speed of execution is important.
  2. localized texts : displayed texts are localized in more than two languages, potentially more - don't expect a fixed number of languages, should be easily extensible.
  3. language defined at runtime : the texts should not be compiled in the application (nor having one application per language), you get the choosen language information at the application launch - that implies some kind of text loading.
  4. cross-platform : the application is be coded with cross-platform in mind (Windows - Linux/Ubuntu - Mac/OSX) so the localized text system have to be cross platform too;
  5. stand-alone application : the application provides all that is necessary to run it; it won't use any environment lib or require the user to install anything other than the OS (like most games for example).

What are the best practices to manage localized texts in C++ in this kind of application?

I made some little researches last year about that and for instance the only things I'm sure of is that 'you should use std::wstring or std::basic_string<ABigEnoughType> to manipulate the texts in the application'. I stopped my searches because I was more working on the "text display" problem (in case of real time 3D) but I guess there are some best practices to manage localized texts in raw C++ than just that and "using Unicode".

So, all best-practices, suggestions and information (cross-platform makes it hard I think) are welcome!

+2  A: 

GNU Gettext does it all.

Milan Babuškov
It's not cross-platform, isn't it?
Klaim
seems it is, there's sections for C#, Java and ObjectiveC as well as the usual linux languages.
gbjbaanb
Yes, gettext is cross-platform. I use it at work in both linux and windows.
David Alfonso
+1  A: 

Asked and answered:

http://stackoverflow.com/questions/185291/best-way-to-design-for-localization-of-strings#185356

Martin York
Thanks for pointing the question, it's pretty similar (if you forget MFC usage). But the answer don't seem totally right to me... the current higher answer require compile time localization...
Klaim
A: 

There won't be any additional features in the C++0x standard, as far as I can tell. I suspect the Committee considers this a matter for third-party libraries.

David Thornley
Won't Unicode encoded characters help? http://en.wikipedia.org/wiki/C%2B%2B0x#New_string_literals
Klaim
Thank you, I had missed that change. Of course, there's a lot it doesn't cover.
David Thornley
+4  A: 

At a small Video Game Company, Black Lantern Studios, I was the Lead developer for a game called Lionel Trains DS. We localized into English, Spanish, French, and German. We knew all the languages up front, so including them at compile time was the only option. (They are burned to a ROM, you see)

I can give you information on some of the things we did. Our strings were loaded into an array at startup based on the language selection of the player. Each individual language went into a separate file with all the strings in the same order. String 1 was always the title of the game, string 2 always the first menu option, and so on. We keyed the arrays off of an enum, as integer indexing is very fast, and in games, speed is everything. ( The solution linked in one of the other answers uses string lookups, which I would tend to avoid.) When displaying the strings, we used a printf() type function to replace markers with values. "Train 3 is departing city 1."

Now for some of the pitfalls.

1) Between languages, phrase order is completely different. "Train 3 is departing city 1." translated to German and back ends up being "From City 1, Train 3 is departing". If you are using something like printf() and your string is "Train %d is departing city %d." the German will end up saying "From City 3, Train 1 is departing." which is completely wrong. We solved this by forcing the translation to retain the same word order, but we ended up with some pretty broken German. Were I to do it again, I would write a function that takes the string and a zero-based array of the values to put in it. Then I would use markers like %0 and %1, basically embedding the array index into the string. Update: @Jonathan Leffler pointed out that a POSIX-compliant printf() supports using %2$s type markers where the 2$ portion instructs the printf() to fill that marker with the second additional parameter. That would be quite handy, so long as it is fast enough. A custom solution may still be faster, so you'll want to make sure and test both.

2) Languages vary greatly in length. What was 30 characters in English came out sometimes to as much as 110 characters in German. This meant it often would not fit the screens we were putting it on. This is probably less of a concern for PC/Mac games, but if you are doing any work where the text must fit in a defined box, you will want to consider this. To solve this issue, we stripped as many adjectives from our text as possible for other languages. This shortened the sentence, but preserved the meaning, if loosing a bit of the flavor. I later designed an application that we could use which would contain the font and the box size and allow the translators to make their own modifications to get the text fit into the box. Not sure if they ever implemented it. You might also consider having scrolling areas of text, if you have this problem.

3) As far as cross platform goes, we wrote pretty much pure C++ for our Localization system. We wrote custom encoded binary files to load, and a custom program to convert from a CSV of language text into a .h with the enum and file to language map, and a .lang for each language. The most platform specific thing we used was the fonts and the printf() function, but you will have something suitable for wherever you are developing, or could write your own if needed.

Aaron
Note that POSIX-compliant versions of the printf() family support the '%2$s' notation to say "this string format item comes from argument 2 (the '2$' part). This allows you to internationalize the order if you use different format strings for different locales.
Jonathan Leffler
Oh? I was not aware of that. I'm pretty certain we didn't have that on the DS, but that would certainly be a good place to start on PC/Mac. With any solution, you would want to make sure it is fast enough for your uses. Various implementaitons of printf() may be too slow for your needs. YMMV. =D
Aaron
This is almost exactly how we do it at Halfbrick Studios. The only difference is that we have a few specialized tags (for things like inserting text or changing font colour) and we build a hash table for the "name" of each string, which allows them to be accessed via scripts.
Grant Peters