views:

257

answers:

5

We use gettext for translations in our product, but have had a lot of problems with it:

  • Can't use a language unless the system supports it.

On Solaris 9 Sparc, if we reset the environment to various English locales, the message still won't be translated, if the machine doesn't have the corresponding locale. The translation file is present, but we can't access it.

  • Uses environment to work out language

This causes problems in servers that want to translate messages into different languages. In theory this could be an entirely thread-safe, parallelisable operation - but gettext means we have to have a global lock around translation.

  • Can't set a default language

By this I don't mean the text in the code. We use MsgIDs in the code, so what I want is to be able to specify a fall-back translation to go to, if the current environment define language is unavailable. But gettext doesn't allow that - I have to try, then reset the environment before it will ordain to look at a different translation. (Using MsgIDs wasn't my choice - I wanted to follow gettext standards and use English as the IDs, but I was overruled, and it would be a lot of work to change it now)

  • Encoding the are returned vary between UTF-8 and current local encoding.

I don't mean the .po files - they are all in UTF-8 (annoying that msgfmt doesn't handle BOM but whatever). I mean the output of gettext ngettext etc, which are in UTF-8 (regardless of local/terminal encoding) on AIX and HPUX, but local encoding on Solaris/Linux/FreeBSD, although that might be due to iconv issues?

In any case it would be nice not to have to have special code for different platforms - I'll have to investigate if I can get bind_textdomain_codeset(domain,codepage); to help against this problem.


Does anyone know of an open-source translation libraries that provide a more useful interface?

+4  A: 

We are using ICU resource bundles and are pretty satisfied with it. The ICU interface is not "modern", but it is powerful, the underlying principles are sound, and resources packaging (with the genrb tool) is pretty flexible. Its message formatting capabilities are also good.

About your specific comments:

Can't use a language unless the system supports it.

I don't understand this one. This may be due to the fact that the only "experience" I have with gettext is having read its documentation.

Uses environment to work out language

The ICU interface takes a Locale as input, so you have complete control. It also has a concept of "default locale" if it is more convenient to you.

Can't set a default language

ICU has an elaborate fallback mechanism, involving a "default" bundle

Encoding the are returned vary between UTF-8 and current local encoding.

String ResourceBundles (other data types are also possible) are always represented as UnicodeString, which is internally encoded in UTF-16. UTF-32 with UnicodeString is pretty easy, as its interface exposes several methods allowing to manipulate it at the codepoint level. For other encodings, code conversion is possible.

Éric Malenfant
I hadn't see ICU translation stuff before - I thought it was just converting encodings... Definitely interesting, since I think ICU might become a dependency of ours at some point anyway.
Douglas Leeder
about system dependency of gettext - e.g. You can't translate a message into Japanese unless the system locale ja_JP.UTF-8 is supported, even if you are running in a UTF-8 system, and might very well be able to display Japanese.
Douglas Leeder
Or even a server that wants to be able to generate Japanese, e.g. for an email, requires that the locale be available just to get a translation message.
Douglas Leeder
+1  A: 

You also can convert ICU resource bundles to and from the XML-based XLIFF format for translation.

Steven R. Loomis
Now if ICU had the ability to read .mo files, that would be even better...
Douglas Leeder
One way would be to use: .mo to .po: http://weblogtoolscollection.com/archives/2007/03/06/wp-translations-mo-and-po-files/.po to xliff: http://translate.sourceforge.net/wiki/toolkit/xliff2poMaybe not what's desired. You might file a bug over at ICU.
Steven R. Loomis
A: 

1. Can't use a language unless the system supports it.

Wrong. You may manually specify language. Using LANGUAGE environment variable

int main()
{
      setlocale(LC_ALL,"");
      setenv("LANGUAGE","foo");
}

This works, even if the locale does not exist (have you ever seen language foo?)

2. Uses environment to work out language

What is problem with that? This gives user more control.

3. Can't set a default language

Wrong, See above.

4. Encoding the are returned vary between UTF-8 and current local encoding.

Wrong, See bind_textdomain_codeset(domain,codepage);

My strong recommendation -- stay withing gettext. It is one of most supported and best tool around. Translators will be thankful to use normal and useful tool.

There is other important point: great support of plural forms that has quite bad support in non-gettext based tools.


There is only 1 limitation of gettext -- you can't use more then one language per-process. It is not thread safe to switch language. Fortunately most programs that incract with human beings are speak in one language.

This may be limitation only for multi-threading services.

EDIT: But even this is not a real problem. I had implemented thread safe gettext version once for my project. See http://art-blog.no-ip.info/cppcms/blog/post/16, based on mo files reader.

Artyom
mo files aren't the problem - the libraries for accessing them are. So I will look with interest at your library - that looks like precisely what I am after.
Douglas Leeder
A: 

The language switching on the Google Chrome browser is very neatly done. It's possible to switch between languages while the program is running. I don't know what system they use, but it may be worth investigating, since it's an open source browser.

Kinopiko
A: 

Can't use a language unless the system supports it.

That has nothing to do with GNU gettext - because that only handles the translation part. But it's true, that if the system is not able to show any chinese characters, then you will have problems with China.

Uses environment to work out language

That's a good choice, but you can always set the language yourself, overriding the environment. This way, you can make it use any language, based on your choice.

Can't set a default language

That's incorrect - default language is always the built-in language, and if you want to have another language, just switch to it. It simply cannot be simpler than one line of code.

Encoding the are returned vary between UTF-8 and current local encoding.

If you're in the position to pick an internationalization tool, then you are also in the position to choose, what character encoding you want to use for your texts. Some projects use utf-8 for all languages (my preference), some use the locale encoding.

Does anyone know of an open-source translation libraries that provide a more useful interface?

No, sorry - I fail to see any problem with GNU gettext :-)

Lars D
Maybe you should ask how to solve your problems instead of skipping the tool... you may be surprised how good it is.
Lars D
When you say the default is the built-in language - you mean the text that was put in the code? We use MsgIDs in the code, so what I want is to be able to specify a fall-back **translation** to go to, if the current environment define language is unavailable. But gettext doesn't allow that - I have to try, then reset the environment before it will ordain to look at a different translation.
Douglas Leeder
Using the environment to work out the language is fine for command-line programs, but fails for servers. And fails if the built-in text is not in a default language.
Douglas Leeder
Encode: I don't mean the .po files - they are all in UTF-8 (annoying that msgfmt doesn't handle BOM but whatever). I mean the output of gettext ngettext etc, which are in UTF-8 (regardless of local/terminal encoding) on AIX and HPUX, but local encoding on Solaris/Linux/FreeBSD, although that might be due to iconv issues?
Douglas Leeder
Fallback languages: There is nothing wrong in doing things like: if (filedoesnotexist('fr/LC_MESSAGES/default.mo')) SetLanguage ('de'); - this way, it chooses to use the German translation if the French is not present.
Lars D
The work environment is application-specific and is inherited by child processes. Your application can also modify it by itself. In other words, the environment provides a lot of defaults, which you can override in all ways. If you want to write an HTTP server, which switches language for each request, that is no problem. GNU gettext is probably the most widely used tool to internationalize server software.
Lars D
I must admit that I'm not too much into AIX and HPUX, but normally msgfmt will use the same encoding inside the mo file, as was used inside the po file. If you encode the po file using utf-8, as I always do, you might see that all platforms return utf-8. Otherwise, you can always make platform-dependent compilation to convert the character set properly.
Lars D
Anyway, when I talk about "how good it is", I mostly refer to the costs of internationalization, localization and maintenance. There may be parameters where other tools are better, but I haven't seen a tool where the amount of manpower needed for internationalization and localizing is so low as for GNU gettext.
Lars D