views:

238

answers:

4

What is the best way to design the Domain objects which can have multi-lingual fields. An example can be a Product class with Description being multi-lingual.

I have found few links but could not decide which one is the best way.

  1. http://fabiomaulo.blogspot.com/2009/06/localized-property-with-nhibernate.html
    (This stores all localised language data in one field. Can be a problem if we query from Sql)

  2. http://ayende.com/Blog/archive/2006/12/26/LocalizingNHibernateContextualParameters.aspx
    (This one has a warning at the beginning that it is a hack and no longer supported)

  3. http://www.webdevbros.net/2009/06/24/create-a-multi-languaged-domain-model-with-nhibernate-and-c/
    (This does not describe how multilingual data will be structured in the database.)

Anyone having experience with using NHibernate with multi-lingual data. Is there a better way?

+1  A: 

The third option looks great. The hibernate mapping is given, but not the database schema - if that's what you are missing, then I'll sketch it out here:

dictionary
----------
ID: int - identity
name: nvarchar(255)

phrase
------
dictionary_id:int  (fkey dictionary.ID)
culture_id:int     (LCID)
phrase:nvarchar(255)  - this is the default size - seems too small

According to this blog entry, 255 is the default string length for String values. To overcome the short string length on the phrase text, you can change the <element> tag to

<element column="phrase" type="String" length="4001"></element>

To use this in your domain model, you add a PhraseDictionary property to your entity where you want translatable text. E.g. the title property or decription property.

I think the article describes a great approach, and is the one that I would go for.

EDIT: In response to the comments, make the length less than 4001 if you know the absolute maximum size is less than that, as this will typically be faster. Also, NHibernate will lazily fetch the collection, but it may fetch all the items at once. You can profile to determine if this has any performance implications. (If you have only a handful of languages then I doubt you will see a difference.) If you have many languages (Say 50+) then it may be worthwhile creating custom properties to fetch the localized text. These will issue queries to fetch specifically the text required. More importantly, you may be able to fetch all the text for a given entity in one query, rather than each localized text property as a separate query.

Note that this extra effort is only needed if profiling gives you reason to be concerned about the performance. Chances are that the implementation in the article as is will function more than adequately.

mdma
note that with a length of over 4000 Sql Server (>= 2005) will place the data in the LOB structure instead of the table structure...Additionally, keep in mind that the 3rd solution will fetch the 'phrase' for all languages with any perf implications applied
Jaguar
@Jaguar - you're right about nvarchar(4001). If you know the phrases are shorter than 4000 then set an appropriate size. As to fetching all phrases, I'm less sure about that. It appears HHibernate does lazy loading by default, so I would hope it only loads the set on demand. But, then it might load the entire set. You can avoid this by writing a criteria to fetch phrase for the current culture. You can put this behind a getter method in your entity.
mdma
+1  A: 

I only have experience for Hibernate, but since nHibernate is so similar:

One option is to define a component type MultilingualString with members for each language (this assumes the set of languages is known at coding time). This type is also a convenient location to place an getter for the string by language id.

class MultiLingualString {
    String english;
    String chinese;
    String klingon;

    String forLanguage(Language lang) {
        switch (lang) {
            // you can guess what goes here
        }
    }
}

This results in the strings for all languages being stored in separate columns in the database while the representation in the object world retains fine granularity.

The advantage is that no join is required to fetch the strings. On the other hand, the only way not to fetch a string with this approach is to use a projection, which is a severe limitation if the strings are large, numerous and rarely needed.

If you do this a lot, writing a UserType might be worth it.

meriton
The problem with this approach is that we have multiple column in a database which can be multilingual. If there are n columns and m languages then it can be (m * n) extra columns in the table if i am not wrong.
Amitabh
A *total* of (m * n) string columns. Yes, there is a trade-off here. But since I don't know m and n it seemed worth mentioning that possibility.
meriton
+1  A: 

From a strictly database oriented standpoint with SQL Server, you should have one table with all of the base data (record key, dates, numbers, etc) and one table with all of the translatable string data. Let call the two tables Base and Base_Description.

Base ensures that there is a single key for each record, the key might be a string or auto-generated id depending on your particular use case.

The Base_Description table is related to the Base table, but also contains a value to select the language that the data is in. In my projects we use the langid column from sys.languages because we can set the language of the connection with and then grab it with @@LANGID for most operations.

In our testing we found this to be significantly faster than having multiple fields for each language, it also allows you to add other languages more easily. We are also using SQL Server Full-Text indexing and it fully works with this method. You should index in the neutral language and then you can pick the language to search against at run time (also filtering against the LangID column in Base_Description).

lambacck
+1  A: 

Do your requirements include the domain objects actually having multiple-language properties in the same object? And, if so, is it unlimited translations stored in the object (in a collection, say - in which case I would say that it would need to be just like any master/detail or parent/child collection) or fixed translations, in which case the languages (and thus the mapping to results of a stored proc or whatever) have to be determined statically anyway?

In many internationalized applications I worked on, the data was in only one language - customer names, the product names (there was no point in mapping even identical products used in one country to products in another, they all had different distributors and different SKUs, and of course localized pricing). The interface was also only in one language (at a time). So all the domain objects only required one language at a time. Thus the language of the translation would be determined when the object was instantiated.

We had translation user interfaces which allowed users to update the translated texts, but these only required two languages at a time (local and the default). I can see this being closest to what you are talking about. I guess that you would have child collections for each translatable property with all the possible translations in the collection. This would probably be closest to the second solution in the third article you linked. Of course, at this point you would also need to see if you want eager/lazy loading etc.

Cade Roux
My requirement does include domain objects having multiple-language properties.
Amitabh
@Amitabh Then I would go with the conventional parent/detail structure. The same as you would use for storing 1-many phone numbers to client's etc.
Cade Roux