Localizing data that is generated dynamically

views:

answers:

Localizing data that is generated dynamically

This was a hard question for me to summarize so we may need to edit this a bit.

Background:

About four years ago we had to translate our asp.net application for our clients in Mexico. Extensibility and scalability was not that much of a concern at the time (oh yes, I just said those dreadful words) because we only have U.S. and Mexican customers.

Rather than use resource files, We replaced every single piece of static text in our application with some type of server control (asp.net label for example). We store each and every English word in a sql database. We have added the ability to translate the English text into another language and also can add cultural overrides. For example, hello can be translated to ¡hola! in one language and overridden to ¡bueno! in a different culture. The business has full control over these translations because will built management utilities for them to control everything. The translation kicks in when we detect that the user has a browser culture other than en-us. Every form descends from a base form that iterates through each server control and executes a translation (translation data is stored as a datatable in an application variable for a culture). I'm still amazed at how fast the control iteration is.

The problem:

The business is very happy with how the translations work. In addition to the static content that I mentioned above, the business now wants to have certain data translated as well. System notes are a good example of a translation they are wanting. Example "Sent Letter #XXXX to Customer" - the business wants the "Sent Letter to Customer" text translated based on their browser culture.

I have read a couple of other posts on SO that talk about localization but they don't address my problem. How do you translate a phrase that is dynamically generated? I could easily read the English text and translate "Sent", "Letter", "to" and "Customer" but i guarantee that it will look stupid to the end user because its a phrase. The dynamic part of the system generated note would screw up any look-ups that we perform on the phrase if we stored the phrase in English, less the dynamic text.

One thought I had... We don't have a table of system generated notes types. I suppose we could create one that had placeholders for dynamic data and the translation engine would ignore the placeholder markers. The problem with this approach is that our sql server database is a replication of an old pick database and we don't really know all the types of system generated phrases (They are deep in the pic code base, in subroutines, control files, etc.). Things like notes, ticklers, payment rejection reasons are all stored differently. Trying to normalize this data has proven difficult. It would be a huge effort to go back and identify and change every pick program that generated a message.

This question is very close but I'm not dealing with just system generated status messages, but rather an infinite number of phrases and types of phrases with no central generation mechanism.

Any ideas?

+1 A:

In a pinch I suppose you could try something like foisting the job off onto Google if you don't have a translation on hand for a particular phrase, and stashing the translation for later.

Stashing the translations for later provides both a data collection point for building a message catalog and a rough (if sometimes laughably wonky) dynamically built starter set of translations. Once you begin the process, track which translations have been reviewed and how frequently each have been hit. Frequently hit machine translations can then be reviewed and refined.

Jeffrey Hantin 2009-05-19 04:21:10

It sounds like dynamic translation might be the only option. If it weren't for, "we don't really know all the types of system generated phrases (They are deep in the pic code base, in subroutines, control files, etc.)." I would just recommend gettext (http://en.wikipedia.org/wiki/Gettext) which has support for phrases.

Matthew Flaschen 2009-05-19 04:34:16

+1 A:

The lack of a "bottleneck" -- what you identify as the (missing) "central generation mechanism" -- is the architectural problem in this situation. Ideally, rearchitecting to put such a bottleneck in place (so you can keep using your general approach with a database of culture-appropriate renditions of messages, just with "placeholders" for e.g. the #XXXX in your example) would be best.

If that's just unfeasible, you can place the "bottleneck" at the other end of the pipe -- when a message is about to be emitted. At that point, or few points, you need to try and match the (English) string that's about to be emitted with a series of well-crafted regular expressions (with "placeholders" typically like (.*?)...) and thereby identify the appropriate key for the DB lookup. Yes, that still is a lot of work, but at least it should be feasible without the issues you mention wrt old translated pick code.

Alex Martelli 2009-05-19 04:35:19

+1 A:

We use technique you propose with insertion points.

"Sent letter #{0:Letter Num} to Customer {1:Customer Full Name}"

Which might be (in reverse Pig Latin, say):

"Ustomercay {1:Customer Full Name} asway entsay etterlay #{0:Letter Num}"

Note that this handles cases where the particular target langue reverses the order of insertion etc. It does not handle subtleties like first, second, etc, which have to be handled with application logic/more phrases:

"This is your {0:first, second, third} warning"

Cade Roux 2009-05-19 04:36:28

Dynamic machine translation is not suitable for a product that you actually expect people to pay money for. The only way to do it is with static templates containing insertion points (as Cade Roux has demonstrated in his answer).

There's no getting around a thorough refactoring of your code to make this feasible. The alternative is to do nothing with those phrases (which is what you're doing now, and it's working out okay, right?). Usually no translation is better than embarrassingly bad translation.

Mike Sickler 2009-05-19 15:29:16

In my experience in Japan, the users simply freeze up and stop working when they see English. In that case, I would think a bad translation is better than no translation. In other cultures I would agree.

Cade Roux 2009-05-19 16:04:17

Tolerance for English depends on the market segment in Japan, but I'm thinking that if the only English in the app are these secondary audit type messages, like 'Updated by Mr. X on some date', then keeping them in English is better than bad Japanese. Better to say 'We haven't gotten around to that yet', then to say 'We got around to it, but did a lousy job'.

Mike Sickler 2009-05-19 17:49:41

ansaurus

tags:

views:

answers:

Localizing data that is generated dynamically

related questions