views:

89

answers:

3

I am developing a multilingual ASP.NET web site. I want the users to be able to add translations 'on the fly' therefore I am storing the translations in a database and maintaining an ASP.NET Application object containing all the translations (for performance). I am using a simple Hashtable to store the translations, then I store this in the Application object and reference it when rendering the page.

Some of the translation entries are short, some are long. It all depends upon what the user wants translated. Whenever the site is rendering, say, control labels or help text, it checks the Hashtable to see if it contains a translation for the English version and if it does then it uses the Hashtable version (assuming a foreign-language user - I use separate Application-level objects for each language).

A typical Hashtable entry might be: key='Things to do today', value='Mon Charge pour aujourd'hui'. The code looks up the table for 'Things to do today' - if a translation is found it uses it, if not it just uses the English version.

My question is this: is a Hashtable the most performant collection for this? Could I somehow improve performance/memory usage by using a different collection/key structure, or even using a different mechanism altogether? My thinking is that I'd rather use the Application object rather than doing distinct database reads for each translation - there may be tens of translations per page render.

A: 

If you want fast access you might try to grow a Trie

Kamil Szot
A trie is very _ineffective_ here since the keys are much longer than in an average use, resulting in a huge Trie. Thereby you waste memory and performance isn't increase either. Tries are useful for relatively short keys.
Henri
+1  A: 

I suggest an alternative approach - create a "script" which will convert translations from your custom source (db or whatever you have) into .NET resource files, insert a command which will run it into Before-Build event of your project. This way you will be able to use native localization functionality of .NET platform and still be able to store translations in any place you want.

As Iain Galloway commented:

It's almost always worth going with the grain of the platform, rather than against it.

Koistya Navin
+1. This will be more effort upfront, but will pay off. It's almost always worth going with the grain of the platform, rather than against it.
Iain Galloway
Although I agree that this is a much better design, I doubt if this would increase performance. I even think that it will decrease performance since the native localization also uses a dictionary under the hood.
Henri
Thanks for your comments. I'm quite keen that users can add translations as they use the system (as they spot omissions, inaccuracies etc). I can easily update the Application-object-based memory object resulting in instant visibility of the new translation, so I'm not keen to adopt a more static approach.
DEH
DEH, you can update .resx files on the server as well (after the website was already deployed). Take a look at this blog post: http://geekswithblogs.net/vivek/archive/2006/12/14/101119.aspx
Koistya Navin
A: 

I think that memory is not so much of an issue given the number of bytes you always have to store is more or less constant. The keyvalue mapping size is equivalent, no matter what mechanism you use.

However, for performance this does not hold. When using a keyvalue collection (like eg hashtable or dictionary), the hash that is used in the dictionary is calculated using the input. I.e. if your key is "abcdefg" it is hashed using the GetHashcode() function. This function obviously uses more computational power if your input to gethashcode (i.e. the input string abcdefg) is longer. You can for example make the gethashcode a constant function O(1) in terms of the length of the string by overriding the function and let it return a hashcode based on a constant number of input characters of your string. Of course this could brake the balance of your hashtable, resulting in slower lookups. You should benchmark this to be sure.

Another solution is to keep a dictionaries/hashtables for each language and use as key some short abbriviation. On lookup you will then have a switch statement which picks the correct dictionary and extracts the string using the abbrivated key. The downside of this is that you increase memory usage, but in theory decrease lookup time. Also in this case you have to benchmark in order to be sure if there is any (positive) difference.

As as sidenote, this sounds to me as a premature optimization. I dont think that such lookups will be a bottleneck to you application.

Henri