views:

260

answers:

3

Hi ! I'm looking for opinions on the following localization technique:

We start with 2 tables:

tblProducts : ProductID, Name,Description,SomeAttribute
tblProductsLocalization : ProductID,Language,Name,Description

and a table-valued function:

CREATE FUNCTION [dbo].[LocalizedProducts](@locale nvarchar(50))
RETURNS TABLE
AS (SELECT a.ProductID,COALESCE(b.Name,a.Name)as [Name],COALESCE(b.Description,a.Description)as [Description],a.SomeAttribute
from tblProducts a 
left outer join tblProductsLocalization_Locale b 
on a.ProductID= b.ProductID and b.[Language]=@locale)

What I plan to do is include the the function whenever i need localized-data returned:

select * from LocalizedProducts('en-US') where ID=1 

instead of

select * from tblProducts  where ID=1 

I'm interested if there are major performance concerns arround this or any showstoppers. Any reasons I shouldn't adopt this?

Edit: I've tagged this SQL2005 , altough I develop this using 2008, I think the deployment target only has SQL2005. I could upgrade to 2008 if the need arises though.

Later edit:

I have created a view, with identical content, but without the parameter:

CREATE VIEW [dbo].[LocalizedProductsView]
AS
SELECT b.Language,a.ProductID,COALESCE(b.Name,a.Name)as [Name],
COALESCE(b.Description,a.Description)as [Description],a.SomeAttributefrom tblProducts a 
left outer join tblProductsLocalization_Locale b on a.ProductID= b.ProductID 

I then proceeded to run some tests: Estimated execution plan looks identical to both queries:

select * from LocalizedProducts('us-US') where SomeNonIndexedParameter=2

select * from LocalizedProductsView where (Language='us-US' or Language is null) and SomeNonIndexedPramaters=2

Final Question that arrises is: Should I understand that the TVF is computing the translations on ALL the products, regardless of the WHERE parameters? is the View doing the same thing ?

A: 

It's a safe bet that you'll have to translate more than product names. So I'd design the translation solution to handle any kind of string.

For example, you could have a localization table like:

Id, TranslatableStringId, Language, Translation

Then each product could have a translatable string associated with it. But also the explanatory text on top of the product list.

For products, you'd query like:

SELECT     *
FROM       Products p
INNER JOIN Translations t 
ON         p.DescriptionId = t.TranslatableStringId
AND        t.language = 'en-US'

For an explanatory text, you'd get a simple:

SELECT     t.Translation
FROM       Translations t 
WHERE      t.TranslatableStringId = 123 -- ID of string
AND        t.language = 'en-US'

P.S. For a real program, I'd use a more shorthand description than TranslatableStringId, like tsid, because translations tend to pop up everywhere.

Andomar
Please don't use shorthand for a column name - nothing worse than having to figure out someone elses shorthand when there's more than enough characters to support a readable, informative column name. Especially in ORM, where you're less likely to have referencial integrity to fall back on.
OMG Ponies
Do you need the `TranslatableStringId` column - wouldn't that be the `Id` column? Just odd that `Id` looks like the pk, but you don't reference it. I'd still name it more something more obvious: `TRANSLATION_ID` or `LOCALIZATION_ID`...
OMG Ponies
One `id` identifies a string in one language. An `TranslatableStringId` identifies multiple translations of the same string.
Andomar
So how do you guarantee that 123 is always associated with 'en-US'?
OMG Ponies
It's up to the translator to make sure 123 is translated for every supported locale. You could write a query to verify that
Andomar
+1  A: 

Short answer: As a general rule, there is nothing wrong with using a TVF for this sort of thing, but I would suggest making the ID be a parameter, also:

CREATE FUNCTION [dbo].[LocalizedProducts](@ID int, @locale nvarchar(50))
RETURNS TABLE
AS (SELECT a.ProductID,COALESCE(b.Name,a.Name)as [Name],COALESCE(b.Description,a.Description)as [Description],a.SomeAttribute
from tblProducts a 
left outer join tblProductsLocalization _Locale b 
on a.ProductID= b.ProductID and b.[Language]=@locale)
where a.ProductId = @ID

Used like so:

select * from LocalizedProducts(1, 'en-US')

Longer explanation: I've never tried something like this in SQL 2008 yet, so it's possible that SQL Server can optimized this issue away.

My experience in earlier versions, though, seems to suggest that SQL Server tends to handle User-Defined Functions in a more procedural than declarative fashion, so it doesn't interpret what you want and then figure out the best way to get you what you want, but actually performs in order the instructions you've written. So it appears to me that this method would:

  1. select all English-language text, placing it into a table variable.
  2. take the results of step #1 and select any records with the given ID.

This would mean a lot of wasted cycles, putting mostly-unused English text into the table variable, before applying the ID filter to that result set. On the other hand, putting all of the filters into the UDF would let SQL Server determine whether it's easiest to filter by ID first (more likely, assuming a standard indexing scheme), and then apply the locale filter, or vice versa. Either way, you should be having less data being moved around in the background, and thus have better performance, if you put all your filters in one spot. Again, this all assumes that SQL Server is not now making giant leaps in optimization. But if so, that's even more reason to say, yes, there is no problem using the TVF.

kcrumley
How would this behave for a product list?
Andomar
A: 

I wanted to come back with an answer to this after doing a lot more testing. It appears to me that SQL2008 is actually looking inside the TVF when performing the query plan and optimizing accordingly:

For instance:

select pr.* from LocalizedProducts('en-US') pr inner join LocalizedPhotos('en-US') ph on 
ph.ProductId=pr.Id where pr.SomeUnindexProperty= 5

This query needs to touch 4 tables:

Products
Products_Localization
Photos
Photos_Localization

The way the query plan looks is that (let me see if I can format this):

Product gets a Clustered Index Seek 
        -- >>  Products gets nested loop with Photos 
                              -->> nested loop Products_Localization -
                                          ->> nested loop Photos_Localization.

Which is not what you would expect if the TVF would be a black box. The simple fact that Product gets an index SEEK would suggest to me that the query will not interpret blindly the entire TVF.

I ran a lot of performance tests, and on average the "localization" TVF are between 50% - 100% slower than using direct table-queries, but that would be expected as twice as many tables are involved in the TVFs than in the normal queries.

Radu094