views:

165

answers:

5

I have wondered what type of data structures and design patterns are used when implementing something like CSS where you can specify formatting or some other property at different levels of granularity.

One specific example that I am working on at the moment relates to internationalization of an application.

First of all English is the default language, but the application will be used in two different regions, Americas and Europe. In most cases, the various pieces of text and labels will be the same between the two regions, but in some cases the technical terms will differ based on the region. When translating the text to a different language, some will retain the original English text for the region.

So looking up the text for a label would work like this. Look at the specific combination of region and language. If there is nothing, look at language only. If nothing, look at combination of English and Region. If nothing, look at English only.

I am looking for the best ways for storing this type of data in a database as well as in a data structure within the code. Either in this specific case, or in general for these types of situations.

A: 

hierarchical, of course!

database:

create table Element 
(
    Id int not null,      --primary key
    ParentId int null,    --parent element foreign key
    tag varchar(64)       --etc
)
create table ElementProperty
(
    Id int not null,          --primary key
    ElementId int not null,   --owning element
    PropertyName varchar(64) not null,
    PropertyValue varchar(512) null
)

object:

public class Element
{
    public string Tag;    //use a property instead though
    public Element ParentElement;
    public IDictionary<string,string> CssProperties;
    public string GetCssPropertyValue(string propertyName)
    {
        if (CssProperties.HasKey(propertyName))
        {
            return CssProperties[propertyName];
        }
        if (ParentElement != null)
        {
            return ParentElement.GetCssPropertyValue(propertyName);
        }
        return null;
    }
}
Steven A. Lowe
+1  A: 

Cascading hash tables work well. Each hash table has a link to another hash table to use if the key is not found locally. There would be a hash table for each local language and region pair, it links to the hash table for its local language, which links to the hash table for the region and English, which finally links to the English hash table.

Lua has this structure built-in as its fundamental table data structure. A table's metatable may be used to satisfy lookups for missing keys. This can also be used to dynamically build class/object hierarchies.

Doug Currie
+1  A: 

As Doug Currie says, cascading tables are the way to go, so you'll have the "en" resources for generic English, and "en_US" or "en_UK" for US and UK locales respectively.

You should check with your language libraries to see if they provide abstractions for this. For example, Java provides the ResourceBundle, which provides hierarchical fallback for locales in this manner: javadoc for java.util.ResourceBundle

Similarly, GNU has gettext which is used throughout the free software world: home page for GNU gettext.

It's likely that your environment will have a similar API for handling i18n/l10n.

Suppressingfire
+1  A: 

If you use the RFC 4646 language tags as your language identifier (just like browsers do; e.g. "DE-AT") they have a built-in hierarchial structure.

To process a lookup break down the identifier at the dashes, into a list sorted by decreasing length, with the neutral/default language (e.g. English) as the empty string; in this case this would be "DE-AT", "DE", "" (=> "EN"). Then find the first match.

I've successfully used this design in both SQL and file-based lookups.

If you have a large number of strings and/or localizations, caching the keys is recommended. A hash table of hash tables works well.

devstuff
+1  A: 

Rather than hard-coding your sequence in the data store, you could drive it in the access layer, per this pseudo-code:

string lookup(key, language, region) {
    string result = datasearch(key, language, region)
    if result == null {
        result = datasearch(key, language, "")
    }
    if result == null {
        result = datasearch(key, "EN", region)
    }
    if result == null {
        result = datasearch(key, "EN", "")
    }
    return result
}

Benefits:

  1. All other parts of the code would call this single lookup routine, so the sequence is in exactly one place.

  2. The sequence isn't embedded in the data store.

  3. That means that you have one point to go if you need to look a the "rules" and one point of maintenance (including no data store maintenance) if you need to change them.

joel.neely