views:

2221

answers:

9

Question

I'm sure many of you have been faced by the challenge of localizing a database backend to an application. If you've not then I'd be pretty confident in saying that the odds of you having to do so in the future is quite large. I'm talking anout storing multiple translations of texts (and the same can be said for currency etc.) for your database entities.

For example the classic "Category" table might have a Name and a Description column which should be globalized. One way would be to do have a "Text" table for each of your entities and then do a join to retreive the values based on the provided language.

This leaves you with a lot of "Text" tables, one for each entity which you want to localize, with the addition of a TextType to distinguish between the various texts that it may store.

I'm curious if there are any, documented, best-practises / design patterns on implementing this kind of support into a SQL Server 2005/2008 datebase (I'm being specific about the RDBMS since it might contain supported keywords and such which helps with the implementation)?

Thoughts on XML approach

One idea I have been toying with (albeit only in my head so far) was to leverage the XML datatype introduced in SQL Server 2005. The idea was to make columns which should support localization, of the XML datatype (and bind a schema to it). The XML would contain the localized strings along with the language code / culture it was tied to.

Something along the lines of

Product
ID (int, identity)
Name (XML ...)
Description (XML ...)

Then you would have something like this as the XML

<localization>
  <text culture="sv-SE">Detta är ett namn</text>
  <text culture="en-EN">This is a name</text>
</localization>

You could then do (This isn't production code so I'll use the *)

SELECT *
From Product
Where Product.ID = 10

And you would get back the product with all localized texts which would mean you would have to do the extraction on the client-side. The biggest problem here is obviously the amount of extra data that you would have to return on each query, The benefits would be a cleaner design with no look-up tables, joins and so on.

Btw, what ever method I do end up using in my design I will still be using Linq To SQL (.NET Platform) to query the database (the XML approach should be a problem since it would return an XElement which could be interpreted client-side)

So suggestion on database localization design patterns, and possibly comments on the XML thought, would be very apprechiated.

+1  A: 

I can't see why you need multiple text tables. A single text table, with a "globally" unique text ID, should be sufficient. The table would have ID, language, text columns, and you would only ever get the text in the language that you need to present (or perhaps not retrieve the text at all). The join should be fairly efficient, since the combination of (ID, language) is the primary key.

Martin v. Löwis
I have several objections to this design:1. Not only texts may have localized values.2. Consider the sample Select in the post. Instead of getting a Name and Description you would get Ids with which you would have to do a second roundtrip. Or you would have to use Sub-Selects.3. Painful to debug
TToni
1. For example? 2. SELECT Product.ID, Product.Name, Text.Message from Product, Text where Product.Description = Text.ID (and Text.Lang="En"); 3. How so?
Martin v. Löwis
Your suggestion is flexible, however, if you are using an ORM it is perhaps less useful...
Joe R
+1  A: 

That's one of the questions that are difficult to answer because there are so many "it depends" in the answer :-)

The answer depends on the amount of localized items in the database, on deployment scenarios, caching issues, access patterns and so on. If you can give us some data on how big the application is, how many concurrent users it will have and how it will be deployed, that would be very helpful.

In general terms I usually use one of two approaches:

  1. Store the localized items near the executable (localized ressource dlls)
  2. Store localized items in the DB and introduce a localeID column in tables that contain the localized items.

The advantage of the first method is the good VisualStudio support. The advantage of the second is the centralized deployment.

TToni
Well just concider your average e-commerce where you have category, products and so on. Each of the entities is likely to have Name, Description etc. (standard vanilla stuff) columns. The values has to be in the db, question is the design and if there are any best-practises/patterns
TheCodeJunkie
OK, that doesn't really help, but I'll try a second answer anyway :-)
TToni
+1  A: 

I see no advantage in using the XML-columns to store the localized values. Except maybe that you have all localized versions of one item "in one place" if that's worth something to you.

I would propose to use a cultureID-column in each table that has localizable items. That way you don't need any XML-handling at all. You already have your data in a relational schema so why introduce another layer of complexity when the relational schema is perfectly capable of handling the problem?

Let's say "sv-SE" has cultureID = 1 and "en-EN" has 2.

Then your query would be modified as

SELECT *
From Product
Where Product.ID = 10 AND Product.cultureID = 1

for a swedish client.

This solution I have seen frequently in localized databases. It scales well with both number of cultures and number of datarecords. It avoids XML-parsing and processing and is easy to implement.

And another point: The XML-solution gives you a flexibility you don't need: You could for example take the "sv-SE"-value from the "Name"-column and the "en-EN"-value from the "Description"-column. However, you don't need this since your client will request only one culture at a time. Flexibility usually has a cost. In this case it is that you need to parse all columns individually while with the cultureID solution you get the whole record with all the values right for the requested culture.

TToni
Something is not right with this suggestion. Why would an entity like Product need have a reference to a Culture/Locale? Are we assuming that ALL attributes of a Product are localizable??
Yarik
@Yarik: I believe he is recommending two records in the database for a single product ID of "10". Each record would be of a different cultureID. This would remove the need for a "text" table as the original author mentioned, by having an instance of each localized version in the DB. To downside to TToni's approach is you no longer have unique ProductID for a clustered index. You'd have to double-key on ProductID and CultureID unique to get the performance back out. And, all of your queries will have to account for the culture.
eduncan911
+1  A: 

I think you can stick with XML which allows for a cleaner design. I would go further and take advantage of the xml:lang attribute which is designed for this usage :

<l10n>
  <text xml:lang="sv-SE">Detta är ett namn</text>
  <text xml:lang="en-EN">This is a name</text>
</l10n>

One step further, you could select the localized resource in your query via a XPath query (as suggested in the comments) to avoid any client side treatment. This would give something like this (untested) :

SELECT Name.value('(l10n/text[lang()="en"])[1]', 'NVARCHAR(MAX)')
  FROM Product
  WHERE Product.ID=10;

Note that this solution would be an elegant but less efficient solution than the separate table one. Which may be OK for some application.

Mac
I did not use XPath queries in SQL, so I am really curious how would an update look with such design.
Yarik
A: 

I like the XML approach, because the separate-table-solution would NOT return a result if e.g. there is no swedish translation (cultureID = 1) unless you do an outer join. But nevertheless you can NOT fall back to English. With the XML approach you simply can fall back to English. Any news on the XML approach in a producitve environment?

A: 

Here some thoghts on the Rick Strahl's blog:

Localization of database Localization of JavaScript

I do prefer to use a single switch in a UserSetting table , which is used by calling stored procedure ... here some of the code

CREATE TABLE [dbo].[Lang_en_US_Msg](
    [MsgId] [int] IDENTITY(1,1) NOT NULL,
    [MsgKey] [varchar](200) NOT NULL,
    [MsgTxt] [varchar](2000) NOT NULL,
    [MsgDescription] [varchar](2000) NOT NULL,
 CONSTRAINT [PK_Lang_US-us__Msg] PRIMARY KEY CLUSTERED 
(
    [MsgId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE TABLE [dbo].[User](
    [UserId] [int] IDENTITY(1,1) NOT NULL,
    [FirstName] [varchar](50) NOT NULL,
    [MiddleName] [varchar](50) NULL,
    [LastName] [varchar](50) NULL,
    [DomainName] [varchar](50) NULL,
 CONSTRAINT [PK_User] PRIMARY KEY CLUSTERED 
(
    [UserId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[UserSetting](
    [UserSettingId] [int] IDENTITY(1,1) NOT NULL,
    [UserId] [int] NOT NULL,
    [CultureInfo] [varchar](50) NOT NULL,
    [GuiLanguage] [varchar](10) NOT NULL,
 CONSTRAINT [PK_UserSetting] PRIMARY KEY CLUSTERED 
(
    [UserSettingId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

 ALTER TABLE [dbo].[UserSetting] ADD  CONSTRAINT [DF_UserSetting_CultureInfo]  DEFAULT ('fi-FI') FOR [CultureInfo]
 GO

 CREATE TABLE [dbo].[Lang_fi_FI_Msg](
    [MsgId] [int] IDENTITY(1,1) NOT NULL,
    [MsgKey] [varchar](200) NOT NULL,
    [MsgTxt] [varchar](2000) NOT NULL,
    [MsgDescription] [varchar](2000) NOT NULL,
    [DbSysNameForExpansion] [varchar](50) NULL,
 CONSTRAINT [PK_Lang_Fi-fi__Msg] PRIMARY KEY CLUSTERED 
(
    [MsgId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE PROCEDURE [dbo].[procGui_GetPageMsgs]
@domainUser varchar(50) ,           -- the domain_user performing the action  
@msgOut varchar(4000) OUT,        -- the (error) msg to be shown to the user   
@debugMsgOut varchar(4000) OUT ,   -- this variable holds the debug msg to be shown if debug level is enabled   
@ret int OUT                  -- the variable indicating success or failure 

AS                            
BEGIN -- proc start                            
 SET NOCOUNT ON;                            

declare @procedureName varchar(200)        
declare @procStep varchar(4000)  


set @procedureName = ( SELECT OBJECT_NAME(@@PROCID))        
set @msgOut = ' '     
set @debugMsgOut = ' '     
set @procStep = ' '     


BEGIN TRY        --begin try                  
set @ret = 1 --assume false from the beginning                  

--===============================================================
 --debug   set @procStep=@procStep + 'GETTING THE GUI LANGUAGE FOR THIS USER '
--===============================================================

declare @guiLanguage nvarchar(10)




if ( @domainUser is null)
    set @guiLanguage = (select Val from AppSetting where Name='guiLanguage')
else 
    set @guiLanguage = (select GuiLanguage from UserSetting us join [User] u on u.UserId = us.UserId where u.DomainName=@domainUser)

set @guiLanguage = REPLACE ( @guiLanguage , '-' , '_' ) ;


--===============================================================
set @procStep=@procStep + ' BUILDING THE SQL QUERY '
--===============================================================

DECLARE @sqlQuery AS nvarchar(2000)
SET @sqlQuery = 'SELECT  MsgKey , MsgTxt FROM dbo.lang_' + @guiLanguage + '_Msg'


--===============================================================
set @procStep=@procStep + 'EXECUTING THE SQL QUERY'
--===============================================================
print @sqlQuery

    exec sp_executesql @sqlQuery

    set @debugMsgOut = @procStep
    set @ret = @@ERROR                  


END TRY        --end try                  

BEGIN CATCH                        
 PRINT 'In CATCH block.                         
 Error number: ' + CAST(ERROR_NUMBER() AS varchar(10)) + '                        
 Error message: ' + ERROR_MESSAGE() + '                        
 Error severity: ' + CAST(ERROR_SEVERITY() AS varchar(10)) + '                        
 Error state: ' + CAST(ERROR_STATE() AS varchar(10)) + '                        
 XACT_STATE: ' + CAST(XACT_STATE() AS varchar(10));                        

set @msgOut = 'Failed to execute ' + @sqlQuery             
set @debugMsgOut = ' Error number: ' + CAST(ERROR_NUMBER() AS varchar(10)) +               
 'Error message: ' + ERROR_MESSAGE() + 'Error severity: ' + CAST(ERROR_SEVERITY() AS varchar(10)) +               
 'Error state: ' + CAST(ERROR_STATE() AS varchar(10)) + 'XACT_STATE: ' + CAST(XACT_STATE() AS varchar(10))                        

--record the error in the database                        
--debug    
 --EXEC [dbo].[procUtils_DebugDb]
    -- @DomainUser = @domainUser,
    -- @debugmsg = @debugMsgOut,
    -- @ret = 1,
    -- @procedureName = @procedureName ,
    -- @procedureStep = @procStep

 -- set @ret = 1                       

END CATCH                        


return  @ret                                   
END --procedure end
YordanGeorgiev
A: 

I see the delima overall - you have a single entity you must represent as a single instance (one ProductID of "10" for example), but have multiple localized text of different columns/properties. That is a tough one, and I do see the need for POS systems, that you only want to track that one ProductID = 10, not multiple products that have different ProductIDs, but are the same thing with just different text.

I would lean towards the XML column solution you and others have outlined here already. Yes, it's more data transfering over the wire - but, it keeps things simple and can be filtered with XElement if packet site becomes an issue.

The main drawback being the amount of data transfered over the wire from the DB to the service layer/UI/App. I would try to do some transformation on the SQL end before returning the result, to only return the one culture UI. You could always just SELECT the currect culsture via xml in an sproc, and return it as normal text as well.

Overall, this is different then, say, a Blog Post or CMS need for localization - which I've done a few of.

My approach to the Post scenerio would be similar to TToni's, with the exception of modelling the data from a the Domain's perspective (and a touch of BDD). With that said, focus on what you want to achieve:

Given a users culture is "sv-se"
When the user views a post list
It should list posts only in "sv-se" culture

This means that the user should see a list of posts only for their culture. The way we implemented this before was to pass in a set of cultures to query for based on what the user could see. If the user has 'sv-se' set as their primary, but also has selected they speak US English (en-us), then the query would be:

SELECT * FROM Post WHERE CultureUI IN ('sv-se', 'en-us')

Notice how this gives you all posts and their different PostID, unique to that language. The PostID isn't as important here on blogs because each post is bound to a different language. If there are copies being transcribed, that works fine here too as each post is unique to that culture, and therefore gets a unique set of comments and such.

But to go back to the 1st part of my answer, your need stems from the requirement of needing a single instance with multiple texts. An Xml column fits that fine.

eduncan911
A: 

Another approach to consider: don't store content in the database, but keep "application" supporting database records and "content" as separate entities.

I used an approach similar to this when creating multiple themes for my ecommerce website. Several of the products have a manufacturer logo which also must match the website theme. Since there is no real database support for themes, I had a problem. The solution I came up with was to use a token in the database to identify the ClientID of the image, rather than storing the URL of the image (which would vary based on theme).

Following the same approach, you could change your database from storing the name and description of the product into storing a name token and a description token that would identify the resource (in an resx file or in the database using Rick Strahl's approach) that contains the content. The built-in functionality of .NET would then handle the switching of language rather than attempting to do it in the database (it is rarely a good idea to put business logic in the database). You could then use the token on the client to look up the correct resource.

Label1.Text = GetLocalResourceObject("TokenStoredInDatabase").ToString()

The disadvantage to this approach is obviously keeping the database tokens and the resource tokens in sync (because products could be added without any descriptions), but could potentially be done easier using a resourceprovider such as the one Rick Strahl created. This approach may not work if you have products that change frequently, but for some people it might.

The advantage is that you have a small amount of data to transfer to the client from the database, your content is cleanly separated from your database, and your database won't need to be more complex than it is now.

On a side note, if you are running an e-Commerce store and actually want to get your localized pages indexed, you have to deviate a little from the seemingly natural way that Microsoft created. There is clearly disagreement between a practical and logical design flow and what Google recommends for SEO. Indeed, some webmasters have complained that their pages weren't indexed by the search engines for anything but the "default" culture because the search engines only will index a single URL once even if it changes depending on the culture of the browser.

Fortunately, there is a simple approach to get around this: put links on the page to translate it into the other languages based on a querystring parameter. An example of this can be found (oops, they won't let me post another link!!) and if you check, each culture of the page has been indexed by both Google and Yahoo (although not by Bing). A more advanced approach may use URL rewriting in combination with some fancy regular expressions to make your single localized page look like it has multiple directories, but actually pass a querystring parameter to the page instead.

NightOwl888
A: 

Indexing becomes an issue. I don't think you can index xml, and of course, you can't index it if you store it as a string because every string will start with <localization> <text culture="...">.

Rei Miyasaka