views:

4239

answers:

5

I'm developing a multilanguage software. As far as the application code goes, localizability is not an issue. We can use language specific resources and have all kinds of tools that work well with them.

But what is the best approach in defining a multilanguage database schema? Let's say we have a lot of tables (100 or more), and each table can have multiple columns that can be localized (most of nvarchar columns should be localizable). For instance one of the tables might hold product information:

CREATE TABLE T_PRODUCT (NAME NVARCHAR(50), DESCRIPTION NTEXT, PRICE NUMBER(18, 2))

I can think of three approaches to support multilingual text in NAME and DESCRIPTION columns:

  1. Separate column for each language

When we add a new language to the system, we must create additional columns to store the translated text, like this:

CREATE TABLE T_PRODUCT (NAME_EN NVARCHAR(50), NAME_DE NVARCHAR(50), NAME_SP NVARCHAR(50), DESCRIPTION_EN NTEXT, DESCRIPTION_DE NTEXT, DESCRIPTION_SP NTEXT, PRICE NUMBER(18,2))
  1. Translation table with columns for each language

Instead of storing translated text, only a foreign key to the translations table is stored. The translations table contains a column for each language.

CREATE TABLE T_PRODUCT (NAME_FK int, DESCRIPTION_FK int, PRICE NUMBER(18, 2))
CREATE TABLE T_TRANSLATION (TRANSLATION_ID, TEXT_EN NTEXT, TEXT_DE NTEXT, TEXT_SP NTEXT)
  1. Translation tables with rows for each language

Instead of storing translated text, only a foreign key to the translations table is stored. The translations table contains only a key, and a separate table contains a row for each translation to a language.

CREATE TABLE T_PRODUCT (NAME_FK int, DESCRIPTION_FK int, PRICE NUMBER(18, 2))
CREATE TABLE T_TRANSLATION (TRANSLATION_ID)
CREATE TABLE T_TRANSLATION_ENTRY (TRANSLATION_FK, LANGUAGE_FK, TRANSLATED_TEXT NTEXT)
CREATE TABLE T_TRANSLATION_LANGUAGE (LANGUAGE_ID, LANGUAGE_CODE CHAR(2))

There are pros and cons to each solution, and I would like to know what are your experiences with these approaches, what do you recommend and how would you go about designing a multilanguage database schema.

+4  A: 

The third option is the best, for a few reasons:

  • Doesn't require altering the database schema for new languages (and thus limiting code changes)
  • Doesn't require a lot of space for unimplemented languages or translations of a a particular item
  • Provides the most flexibility
  • You don't end up with sparse tables
  • You don't have to worry about null keys and checking that you're displaying an existing translation instead of some null entry.
  • If you change or expand your database to encompass other translatable items/things/etc you can use the same tables and system - this is very uncoupled from the rest of the data.
Adam Davis
I agree, though personally I'd have a localised table for each main table, to allow foreign keys to be implemented.
Neil Barnwell
Although the third option is the most clean and sound implementation of the problem it is more complex then first one. I think displaying, editing, reporting the general version needs so much extra effort that it does not always acceptable. I have implemented both solutions, the simpler was enough when the users needed a read-only (sometimes missing) translation of the "main" application language.
rics
+4  A: 

I usually would go for this approach (not actual sql), this corresponds with your last option.

table Product
productid INT PK, price DECIMAL, translationid INT FK

table Translation
translationid INT PK

table TranslationItem
translationitemid INT PK, translationid INT FK, text VARCHAR, languagecode CHAR(2)

view ProductView
select * from Product
inner join Translation
inner join TranslationItem
where languagecode='en'

Because having all translatable texts in one place makes maintenance so much easier. Sometimes translations are outsourced to translation bureaus, this way you can send them just one big export file, and import it back just as easily.

A: 
Aleris
I know that localization is a much broader topic and I am aware of the issues that you bring to my attention, but currently I am looking for an answer for a very specific problem of schema design. I assume that new languages will be added incrementally and each will be translated almost completely.
qbeuek
A: 

What do you think about having a related translation table for each translatable table?

CREATE TABLE T_PRODUCT (pr_id int, PRICE NUMBER(18, 2))

CREATE TABLE T_PRODUCT_tr (pr_id INT FK, languagecode varchar, pr_name text, pr_descr text)

This way if you have multiple translatable column it would only require a single join to get it + since you are not autogenerating a translationid it may be easier to import items together with their related translations.

The negative side of this is that if you have a complex language fallback mechanism you may need to implement that for each translation table - if you are relying on some stored procedure to do that. If you do that from the app this will probably not be a problem.

Let me know what you think - I am also about to make a decision on this for our next application. So far we have used your 3rd type.

This option is similar to my option nr 1 but better. It is still hard to maintain and requires creating new tables for new languages, so I'd be reluctant to implement it.
qbeuek
it doesn't require a new table for a new language - you simply add a new row to the appropriate _tr table with your new language, you only need to create a new _tr table if you create a new translatable table
A: 

plz can any one give me example on the therd one...?