ansaurus

Question

Attempt at database localization using table-valued functions

Answer 1

A:

It's a safe bet that you'll have to translate more than product names. So I'd design the translation solution to handle any kind of string.

For example, you could have a localization table like:

Id, TranslatableStringId, Language, Translation

Then each product could have a translatable string associated with it. But also the explanatory text on top of the product list.

For products, you'd query like:

SELECT     *
FROM       Products p
INNER JOIN Translations t 
ON         p.DescriptionId = t.TranslatableStringId
AND        t.language = 'en-US'

For an explanatory text, you'd get a simple:

SELECT     t.Translation
FROM       Translations t 
WHERE      t.TranslatableStringId = 123 -- ID of string
AND        t.language = 'en-US'

P.S. For a real program, I'd use a more shorthand description than TranslatableStringId, like tsid, because translations tend to pop up everywhere.

Andomar 2009-11-07 17:14:26

Please don't use shorthand for a column name - nothing worse than having to figure out someone elses shorthand when there's more than enough characters to support a readable, informative column name. Especially in ORM, where you're less likely to have referencial integrity to fall back on.

OMG Ponies 2009-11-07 17:43:47

Do you need the `TranslatableStringId` column - wouldn't that be the `Id` column? Just odd that `Id` looks like the pk, but you don't reference it. I'd still name it more something more obvious: `TRANSLATION_ID` or `LOCALIZATION_ID`...

OMG Ponies 2009-11-07 17:46:55

One `id` identifies a string in one language. An `TranslatableStringId` identifies multiple translations of the same string.

Andomar 2009-11-07 17:52:15

So how do you guarantee that 123 is always associated with 'en-US'?

OMG Ponies 2009-11-07 18:20:59

It's up to the translator to make sure 123 is translated for every supported locale. You could write a query to verify that

Andomar 2009-11-07 18:30:54

Answer 2

+1 A:

Short answer: As a general rule, there is nothing wrong with using a TVF for this sort of thing, but I would suggest making the ID be a parameter, also:

CREATE FUNCTION [dbo].[LocalizedProducts](@ID int, @locale nvarchar(50))
RETURNS TABLE
AS (SELECT a.ProductID,COALESCE(b.Name,a.Name)as [Name],COALESCE(b.Description,a.Description)as [Description],a.SomeAttribute
from tblProducts a 
left outer join tblProductsLocalization _Locale b 
on a.ProductID= b.ProductID and b.[Language]=@locale)
where a.ProductId = @ID

Used like so:

select * from LocalizedProducts(1, 'en-US')

Longer explanation: I've never tried something like this in SQL 2008 yet, so it's possible that SQL Server can optimized this issue away.

My experience in earlier versions, though, seems to suggest that SQL Server tends to handle User-Defined Functions in a more procedural than declarative fashion, so it doesn't interpret what you want and then figure out the best way to get you what you want, but actually performs in order the instructions you've written. So it appears to me that this method would:

select all English-language text, placing it into a table variable.
take the results of step #1 and select any records with the given ID.

This would mean a lot of wasted cycles, putting mostly-unused English text into the table variable, before applying the ID filter to that result set. On the other hand, putting all of the filters into the UDF would let SQL Server determine whether it's easiest to filter by ID first (more likely, assuming a standard indexing scheme), and then apply the locale filter, or vice versa. Either way, you should be having less data being moved around in the background, and thus have better performance, if you put all your filters in one spot. Again, this all assumes that SQL Server is not now making giant leaps in optimization. But if so, that's even more reason to say, yes, there is no problem using the TVF.

kcrumley 2009-11-07 17:32:56

How would this behave for a product list?

Andomar 2009-11-07 17:34:59

Answer 3

A:

I wanted to come back with an answer to this after doing a lot more testing. It appears to me that SQL2008 is actually looking inside the TVF when performing the query plan and optimizing accordingly:

For instance:

select pr.* from LocalizedProducts('en-US') pr inner join LocalizedPhotos('en-US') ph on 
ph.ProductId=pr.Id where pr.SomeUnindexProperty= 5

This query needs to touch 4 tables:

Products
Products_Localization
Photos
Photos_Localization

The way the query plan looks is that (let me see if I can format this):

Product gets a Clustered Index Seek 
        -- >>  Products gets nested loop with Photos 
                              -->> nested loop Products_Localization -
                                          ->> nested loop Photos_Localization.

Which is not what you would expect if the TVF would be a black box. The simple fact that Product gets an index SEEK would suggest to me that the query will not interpret blindly the entire TVF.

I ran a lot of performance tests, and on average the "localization" TVF are between 50% - 100% slower than using direct table-queries, but that would be expected as twice as many tables are involved in the TVFs than in the normal queries.

Radu094 2009-11-12 19:27:18

ansaurus

tags:

views:

answers:

Attempt at database localization using table-valued functions

related questions