views:

220

answers:

6

In a multilanguage application with lookup tables, what's the best way to handle translations?

For example for a country lookup table, a US user should see English country names, as a German should see German names. But still their IDs should be the same.

I can think of the following:

  • Add a different lookup table for every language
  • Use a single lookup table with multiple entries for the same country flagged according to its language.
  • Get rid of all lookup tables and do all lookups by program code
  • [Any other idea I didn't think of?]
+2  A: 

The big question here is - can translations can be altered by end users?

If the answer is "NO", resource bundles are easier and faster to use than tables. Instead of actual text your tables will contain resource key (or you could use primary key for that purpose if it's textual rather than numeric) and appropriate resource bundle will contain the translation.

If the answer is "YES", you have to store translations in the database. I have found, however, that the easiest approach in this scenario is to mimic the above resource bundle functionality in the database - e.g. have a single table with "locale", "resource key", "resource value" columns that all other tables will use to look up actual localized text.

ChssPly76
+3  A: 

Disjoint presentation from programming. Internally, use IDs for everything; when presenting to users, present localized data.

McWafflestix
A: 

I did a multi-lingual application once. Everything had to be multi-lingual, including the content users entered, and it also had to be easy to do a translation on this content.

Now if its just for static text most recommend resource xml files to do this. I believe this is supported by the .NET framework quite well.

If it has to be dynamic, then I did a table structure of

Table ResourceStrings resourceId GUId PK cultureCode String PK text String

This then allowed me todo cascade read on the culture code as well. So if someone had a culture code of en-us, I could do a query where I did

SELECT FROM ResourceStrings
WHERE resourceId = <AND ID> AND cultureCode IS IN ('en','en-us')
ORDER BY cultureCode DESC

That way the longest culture code would come back first and be the most specific. Now I would recommend this ONLY if you are allowing users to enter text and translate it. I needed this because the content had to be multi-lingual as well as the application itself.

Zoidberg
A: 

Check out something like the Adobe Source Libraries' xstring data structures and algorithms. Localization there is handled with an id for the string as well as context detailing the localization. Localization tables can be stored in XML and strings are localized at runtime based on runtime context (language, country, platform, etc.) Though the code itself works I wouldn't consider it production quality. Nevertheless the concepts are solid.

fbrereto
A: 

In my current project ( a custom CMS written in django ), our solution for I18N models is based on this example snippet : http://www.djangosnippets.org/snippets/855/ , which I extended to be more usable in templates and integrated within the administration interface.

Basically, each kind content has two tables : one with the common fields ( like articles' category ) and one with translatable content ( title, body, slug - unique label used in the url - etc. ). Obviously there is a ony-to-many relationship between common model and translation model. Here's the example the author gives :

class Article(models.Model):
    author = models.CharField(max_length = 40)

class ArticleI18N(I18NModel):
    title = models.CharField(max_length = 120)
    body = models.TextField()

I think that this way, the database layout is really close to the concept of having content with common attributes and translatable fields.

But then the tricky part is staying DRY in your code, or you'll end up with a mess of boilerplate every time you need to handle translatable content. Fortunately python's flexibility was of great help.

If your programming language and environment do not allow similar tricks ( like dynamic creation of subclasses, python's metaclasses - some kind of inheritance hook, etc. ), I guess this kind of database layout is going to be more a curse than a blessing.

So, keep the YAGNI principle in mind. If you need translations inside your models with less complications, I've seen other effective ways, which are fine as long as you can afford the limited flexibility and the lack of conceptual integrity of those alternatives :

  • 1) use additional columns for every translatable column and every language : title_en, title_fr, title_de, content_en, content_fr, content_de, ...
  • 2) serialize multiple languages in one column.
    For instance : title = "|EN|Welcome|FR|Bienvenue|DE|Willkommen"
    I don't like that one particularly, but what matters really here is whether it's integrating nicely within an existing environment, which was the case.
  • 3) Sometimes the link between the same content in different languages doesn't need to be strict. I think it's the case of the articles in wikipedia - translations are just hyperlinks manually set by the authors. As a consequence, those links are less exploitable by the software, but what matters here is being browsable by a user.
vincent
A: 

Here's a way to do this at the database level.

When I needed to do this, I broke each code table up into two tables. One contained culture-invariant data: the code's internal ID for sure, the code itself if the code was culture-invariant, and possibly other columns (like sorting/grouping categories). The other contained culture-specific data: descriptions, culture-specific codes if appropriate, and so on.

(Culture-specific codes are a hornet's nest; don't kick it if you don't have to. It can be a little hard to get your head around the idea that US and EU are the same code in different languages. But there are countries where it can be politically unpalatable to make the French-speaking users use US as their abbreviation for États-Unis. Well, a country.)

Having a culture-invariant code table allows you to establish foreign-key constraints between it and main tables that use it. Constructing culture-specific queries is pretty straightforward:

SELECT m.*, c.Code, ISNULL(s.Description, lf.Message)
FROM MainTable m
JOIN FooCodeData c ON m.CodeID = c.ID
LEFT JOIN CultureSpecificFooCodeData s ON s.CodeID = c.ID AND s.Culture = @Culture
JOIN LookupFailure lf ON lf.Culture = @Culture

Note that if your codes are culture-specific, you don't even need to join FooCodeData in that query, selecting s.Code instead.

Note also one inherent weakness of this approach, as shown by the LEFT JOIN and ISNULL in that query: you can't create a constraint to guarantee that a culture-specific row exists for every code/culture combination. That's what LookupFailure is for: it gets the culture-specific message that indicates that the culture-specific code record has not been entered for a given code.

Robert Rossney