views:

205

answers:

3

I need database structure for storing versions of site's content in different languages. Right now I'm doing it like this:

[Item]  
Id  
SomeColumn  

[ItemStrings]  
ItemId  
LanguageId  
Title  
Description  
...

[Languages]  
Id  
Culture

Although, it is a pretty neat way to do translation, it requires a lot of monkey coding when adding new entities into the system.
The other solution, that I thought of, was some global table for ALL strings that need to be translated, with unique id and language id as primary key.
I like the second way much more because it is more DRY.

Now, real question is: can I use nvarchar(MAX) for all my records? Will it consume much more memory, when, say only 20% of values will be worth varchar(max) and others would easily fit in nvarchar(50-something)?

I'm using SQL Server 2008.

+1  A: 

The second approach is closer to how localization is typically done (each string has an ID of some sort and can be looked up for various languages.)

Regarding using nvarchar(MAX), that should be fine. varchar types only use as much space as they need.

Ben S
I think, I will stick to this solution in future releases then. Thanks for confirmation.
+2  A: 

Last project I worked on, the key was the English phrase, rather than an ID. It made code easier to write. You call a method GetTranslationFor("English Phrase", culture); as opposed to GetTranslationFor(123, culture);

Then, your devs just write code, and aren't spending time looking for IDs for the phrase they want, or adding them. Have the GetTranslationFor method send an email notification to an admin if it doesn't find a translation for the phrase in the database, so it can be added, BUT fall back to the phrase entered as a result.

It's better to show an english phrase on a french site, than some error or nothing.

And nvarchar max should be fine.

ps What I was saying was your second method looks good, I'd just add an extra key/index using the english phrase as your key.

Chad
gettext (http://www.gnu.org/software/gettext/) uses this technique.
Daniel Straight
But there's no reason you couldn't default to English even with using numeric IDs.
Daniel Straight
No, there's no reason you couldn't default to English by ID, that wasn't the point. The point was that your programmers waste time checking what the ID is for the phrase "Main Menu" and if it doesn't exist, they waste time creating it. I'm not going to remember the IDs for 100s of phrases, and I don't want to spend time looking in the DB for them. I'd rather write the code with the english phrase, and leave it up to the translators/admin to add them with their appropriate translations to the database.
Chad
Problem with this approach, is that I don't have _main_ language. So I don't have key phrases to translate. I just need separate versions of content for different languages.
A: 

I'd have a key column (named entities, e.g., "OKAY", "SaveAs" etc - nvarchar(32) should suffice) and a language column (I'd use a code, like EN-UK for British English, so char(5)) as these are standardised. These two columns would be a unique index/primary key. I'd then have a column for the actual text - as much nvarchar length as you like. I guess that the column/table/database should be in utf8?

You then only have a single, non-joined request to the database for a string.

I'd also have a fall-back for when a translation isn't available, so make your request get EN-US if the requested language doesn't have a value. Better to have something than nothing.

JeeBee