views:

272

answers:

1

I need to make a database that is highly localized. For almost all entities I need a translation to 5+ languages. Some entities even require and additional resource localized (like images which I enter as paths).

The question now is:

1: LOOKUP TABLE PER ENTITY/TABLE (kinda bloated schema?)

should I create a "Localized" localization lookup table for each table I need localized values (and use standard int/bigint PKs for elements) Like here:

MYITEMS
-------
 - MyItemId BIGINT PK
 - MyItemPrice DECIMAL

MYITEMLOCALIZED
---------------
 - CPK_MyItemId BIGINT FK
 - CPK_LanguageCode NCHAR
 - LocalizedName NVARCHAR
 - LocalizedResourcePath NVARCHAR

CUSTOMERS
---------
 - CustomerId BIGINT PK
 - CustomerName NVARCHAR

CUSTOMERLOCALIZED
---------------
 - CPK_CustomerId BIGINT FK
 - CPK_LanguageCode NCHAR
 - LocalizedName NVARCHAR
 - LocalizedResourcePath NVARCHAR

or

2: GUIDS WITH SINGLE LOOKUP LOCALIZATION TABLE (heavy guid usage?)

should I use GUIDs as PKs and then just use a single name and a single resource localization table.

MYITEMS
-------
 - MyItemId uniqueidentifier PK
 - MyItemPrice DECIMAL    

CUSTOMERS
---------
 - CustomerId uniqueidentifier PK
 - CustomerName NVARCHAR

LOCALIZED
    ---------------
     - CPK_ElementGuid uniqueidentifier FK
     - CPK_LanguageCode NCHAR
     - LocalizedValue NVARCHAR
     - LocalizedResourcePath NVARCHAR

3: SINGLE LOOKUP BUT GUIDS ONLY FOR LOCALIZATION (best of 2 worlds?)

should I use normal int/bigint PKs and then add a GUID column for each column I need localized and store localized values into a single localization lookup table.

MYITEMS
-------
 - MyItemId BIGINT PK
 - MyItemPrice DECIMAL
 - ItemNameLocalizationGuid uniqueidentifier(GUID)
 - ItemPictureLocalizationGuid uniqueidentifier(GUID)


CUSTOMERS
---------
 - CustomerId BIGINT PK
 - CustomerName NVARCHAR
 - CustomeerNameLocalizationGuid uniqueidentifier(GUID)

LOCALIZED
---------------
 - CPK_ElementGuid uniqueidentifier FK
 - CPK_LanguageCode NCHAR
 - LocalizedValue NVARCHAR

4: LOOKUP TABLE THAT RETURNS LOCALIZATION ID (go back and forth?)

should I create tables with no guids, but store localization id in the mother-table?

Like here:

MYITEMS
-------
 - MyItemId BIGINT PK
 - MyItemPrice DECIMAL
 - MyItemNameLocalizedId BIGINT    

CUSTOMERS
---------
 - CustomerId BIGINT PK
 - CustomerName NVARCHAR
 - CustomerGenderLocalizedId BIGINT

LOCALIZED
---------------
 - LocalizationId BIGINT PK
 - CustomerId BIGINT FK
 - LanguageCode NCHAR
 - LocalizedName NVARCHAR
 - LocalizedResourcePath NVARCHAR

If I use GUIDs as PKs I've read I'll suffer huge performance and data size penalty, but I will also instantly deal with element uniqueness across servers, dbs...

+5  A: 

First of all, I'd strongly recommend using an existing standard for the localization identifiers - don't re-invent yet another system! Use the ISO-639 standard codes for language, e.g. "en" for English, "fr" for French etc.

See Wikipedia for a list of all the defined codes.

Secondly, in my experience and my judgment, I would use a language table per entity.

We typically have some "system name" on the main table, e.g. the English text, and then we have a table "(entity)_TX" for the textual representation in various languages.

Something like this:

  TABLE CustomerType
      CustomerTypeID    INT IDENTITY(1,1) PK
      CustomerTypeName  VARCHAR(100)    -- English "system" name, e.g. "Gold customer"

  TABLE CustomerType_TX
      CustomerTypeID    INT
      LanguageID        CHAR(2)   -- ISO-639 codes
      CustomerTypeText  VARCHAR(200)   -- translated texts

To me, this is clearer and more explicit and more "intuitive" than having a single, GUID-based encoding scheme.

marc_s
+1 for using the standard codes.
devstuff
That is my option #1 actually.Well, standard codes like en-gb and en-us are a bit over char(2) so I'll have to widen that a bit in this case :)
BuzzBubba
please keep in mind that GUIDs are not guaranteed to be unique either...granted there is a high improbability that you will ever see a duplicate but still...
espais
@BuzzBubba: yes, if you need to distinguish the regionalized languages like "en-GB" vs. "en-US" vs. "en-CA", then yes, you need some more characters. But still - I'd use the standard codes, if ever possible - no need to invent yet another set of constants for something that's really already been solved :-)
marc_s
Wouldn't CustomerType_TX.CustomerTypeText need to be nvarchar?
Thomas
@Thomas: depends - if you really truly will have to deal with Asian, Hebrew, Arabic, Cyrillic languages and alphabets - yes. If you're exclusively with western european languages, having 2 bytes for each character is a waste of space.
marc_s