ansaurus

Question

i18n - best practices for internationalization - XLIFF, gettext, INI, ... ??

Answer 1

A:

you can use INI if you want, it's just that INI doesn't have a way to tell anyone that it is in UTF8, so if someone opens your INI with an editor, it might corrupt yout file.

So the idea is that, if you can trust the user to edit it with a UTF8 encoding.

You can add a BOM at the start of the file, some editors knows about it.

What do you want it to store ? user generated content or your application ressources ?

CiNN 2008-11-11 07:48:33

I want the ini to store the language strings. I would then have one ini for each language and each module. like default.en, default.de, default. fr ...

tharkun 2008-11-11 07:50:43

then you can use a simple INI, you just need to state in your doc that the translation files NEED to be in UTF8. And if the translator does not, it's his fault :)

CiNN 2008-11-11 08:01:46

I did a variant of this (e.g. an INI-type file per language), and loaded it into a custom hashtable. It was fast and worked nice, except for working around some home-grown OO in the C app.

torial 2009-03-03 06:40:38

Answer 2

+10 A:

POEdit isn't really hard to get a hang of. Just create a new .po file, then tell it to import strings from source files. The program scans your PHP files for any function calls matching _("Text"), gettext("Text"), etc. You can even specify your own functions to look for.

You then enter a translation in the appropriate box. When you save your .po file, a .mo file is automatically generated. That's just a binary version of the translations that gettext can easily parse.

In your PHP script make a call to bindtextdomain() telling it where your .mo file is located. Now any strings passed to gettext (or the underscore function) will be translated.

It makes it really easy to keep your translation files up to date. POEdit also has some neat features like allowing comments, showing changed and dropped strings and allowing fuzzy matches, which means you don't have to re-translate strings that have been slightly modified.

Josh 2008-11-11 07:58:21

Answer 3

A:

I worked with two of these formats on the l18n side: TMX and XLIFF. They are pretty similar. TMX is more popular nowdays, but XLIFF is gaining support quickly. There was at least one free XLIFF editor when I last looked into it: Transolution but it is not being developed now.

Nemanja Trifunovic 2008-11-17 00:10:12

Answer 4

A:

One rather simple approach is to just use a resource file and resource script. Programs like MSVC have no problem editing them. They're also reasonably friendly to other systems (and to text editors) as well. You can just create separate string tables (and bitmap tables) for each language, and mark each such table with what language it is in.

Brian 2009-01-27 18:59:25

Answer 5

A:

None of those choices looks very appetizing to me.

If you're sending files out for translation in multiple languages, then you want to be able to trust that the encodings are correct, especially if you no one in your team speaks those languages. Sometimes it's difficult to spot an encoding problem in a foreign language, and it is just too easy to inadvertantly corrupt file encodings if you let your OS 'guess'.

You really want a format that declares its encoding. Otherwise, translators or their translation tools might select something other than UTF-8. For my money, any kind of simple XML format is best, but it looks like you'd need to roll your own in Zend. XLIFF and TMX are certainly overkill.

A format like Java's XML resources would be ideal.

Mike Sickler 2009-01-27 19:15:20

Why would you need to roll your own? Have you used ZF?

sims 2010-08-27 00:19:22

Answer 6

+2 A:

There is always Translate Toolkit which allow translating between I think all mentioned formats, and preferred gettext (po) and XLIFF.

Jakub Narębski 2009-01-27 22:22:19

Answer 7

A:

I do the data storage myself using a custom design - All displayed text is stored in the DB.

I have two tables. The first table has an identity value, a 32 character varchar field (indexed on this field) and a 200 character english description of the phrase.

My second table has the identity value from the first table, a language code (EN_UK,EN_US,etc) and an NVARCHAR column for the text.

I use an nvarchar for the text because it supports other character sets which I don't yet use.

The 32 character varchar in the first table stores something like 'pleaselogin' while the second table actually stores the full "Please enter your login and password below".

I have created a huge list of dynamic values which I replace at runtime. An example would be "You have {[dynamic:passworddaysremain]} days to change your password." - this allows me to work around the word ordering in different languages.

I have only had to deal with Arabic numerals so far but will have to work something out for the first user who requires non arabic numbers.

I actually pull this information out of the database on a 2 hourly interval and cache it to the disk in a file for each language in XML. Extensive use of the CDATA is used.

There are many options available, for performance you could use html templates for each language - My method works well but does use the XML DOM a lot at runtime to create the pages.

John 2009-01-28 18:15:08

Answer 8

A:

This might be a little different from what's been posted so far and may not be exactly what you're looking for, but I thought I would add it, if for nothing else but a different approach. I went with an object-oriented approach. What I did was create a system that encapsulates language files into a class by storing them in an array of string=>translation pairs. Access to the translation is through a method called translate with the key string as a parameter. Extending classes inherit the parent's language array and can add to it or overwrite it. Because the classes are extensible, you can change a base class and have the changes propagate through the children, making more maintainable than an array by itself. Plus, you only call the classes you need.

VirtuosiMedia 2009-02-01 18:59:52

Answer 9

A:

We just store the strings in the DB and have a translator mode built into the application to handle actually adding strings for different languages.

In the application we use various tricks to create text ids, like

£("btn_save")
£(Order.class,"amt")

The translations is loaded from the db when the system boots, or when a reload is manually triggered. The £ method takes care of looking up the translated string according the the language specified in the user session.

John Nilsson 2009-02-01 22:26:57

Answer 10

A:

Hello ,

you can check my l10n tool called iL10Nz on http;//www.myl10n.net

You can upload po/pot files, xliff, ini files , translate, download.

you can also check out this video on youtube http://www.youtube.com/watch?v=LJLmxMFxaxA

Thanks Olivier

2009-07-03 19:58:48

ansaurus

tags:

views:

answers:

i18n - best practices for internationalization - XLIFF, gettext, INI, ... ??

related questions