So far there've been several questions regarding this, and they've all come down to the same answer: one table for the language-neutral data, 1-* to a table with the translations and an indexed language ID field.
This has several problems:
- Twice as much CRUD.
- Need for Ajax CRUD if you want a decently friendly web UI.
- More than twice the validation -- you need to ensure that the relationship is 1-* rather than 0-*.
- Collation differences between languages isn't accommodated.
- Queries require joins.
- If you want slugs in multiple languages, oh boy.
A lot of database people have worked on all sorts of theoretical and practical problems, but surprisingly few people work on this one.
I think what we need ultimately is:
- A field type that'll store multiple versions of strings
- Multiple indices for each such field, one for each language or variation, with the option to specify the correct collation mode
- A standard ORM object for this crazy thing
- UI elements
Overkill? Sure, maybe, but the whole problem is a real nightmare as it is. And it's not exactly an uncommon scenario.
We gotta try to convince server vendors to work on this.
Edit: By the way, this is my first time using the community wiki; hopefully I'm doing it right.
Edit 2: Something about my wording seems to have made people think that I'm attacking the very concept of DBMS. I'm not; I'm simply saying that built-in support for localization is a much-needed feature.
I probably shouldn't have mentioned performance; it's of course completely negligible most of the time. The focus of my concern is on the fact that this really stifles productivity.
I'll provide an example. Suppose I have a very trivial table for a decidedly trivial store:
Products (id, price, description, name, slug)
In EF/MVC, I'd throw this in the ORM designer, maybe encapsulate it in a repository, build a Products controller, and have actions for Index, Details, Create, Update, Edit and Delete. To identify a product in any of the items, I'd simply do a WHERE(slug = @slug). I'd make a view model for the create/edit actions, design the form control, and wire it up straight to the repository. Done and done. To access the details for a product, the user would go to /products/details/product-slug
.
But then since the rest of the website is bilingual, I decide to change the products table accordingly.
Products (id, price)
ProductsText (productId, language, description, name, slug)
Hey, that's not so bad. Yeah, not yet. Then you write your relationships and your constraints, and then you write you write out all your properties in the view-model, and then you make a complete CRUD controller for the ProductsText data or use jQuery/Ajax to add create/update/edit buttons on your Products controller, and then you add validation logic to make sure the user enters at least the primary language, and then when you want to read data for the end-user pages you write another query to take join ProductsText.slug and ProductsText.language with Products... I probably missed something, but you get the idea.
The complexity of the program just explodes with boilerplate code once you have localization involved.
Of course, I don't expect the problem to be solved completely, and it's obviously just as much a UI problem as it is a database problem. But there's just so much that could be done to make all this easier. A "multistring" field type might be a really good start.
Edit 3: Anyone ever hear of SQL Server Modeling Services? It has some localization tools in it that could be a step in the right direction. Still CTP though.
-- Simulate the French locale with the SET LANGUAGE statement.
SET LANGUAGE French
select Id, CountryName,
[System.Globalization].[SessionsString](CountryName, 1) as CountryNameString
from [Location].[CountriesTable]