views:

120

answers:

3

I'm trying to use the web-based google translate to translate my english files to another language. They contains characters like %s and %d. Is there a way to protect them from being erroneously translated.

For instance, the text:

Athlete already exists with number %s

is translated to:

Athlète existe déjà avec nombre% s

while I would expect it to be translated to:

Athlète existe déjà avec nombre %s

(I'm processing the input and output so I could add characters around it to 'escape' the %s and %d strings. I thought already to replace %s by some word I'm sure google will not try to translate self, but I hope there is a nicer solution)

+2  A: 

Strange idea, but..

Replace each format specifier with an unique number in underscores (or whatever survives translation unchanged and does not interfere with you usage of numerals), like:

Athlete already exists with number %s => Athlete already exists with number _001 _

Translate to chinese: 運動員已經存在的號碼 _001_

After that, check if the numbers are in the same order after translation if you had multiple format specifier in a format string translation and if yes, replace the specifier back.

Luther Blissett
This is similar to my comment suggesting to use .Net specifiers: {1}, {2}. Advantage is that the translation can change the order of the format specifiers and the translation would still work.
Patrick
Clever, and in fact you can translate back to XPG3 positional specifiers for easier application. But would it work with languages that change the tense (not quite the right word, but I'm no linguist) according to the cardinality of the number? IIRC, 1/many isn't the only distinguisher.
Donal Fellows
Donal, when translating, try to avoid any use of masculin/feminin (especially when translating to French, German, ...) and singular/plural. Notice that there are even languages that have a different 'suffix' for 2 different from the suffix for many (Slovenian).
Patrick
@Patrick: Precisely my point. I knew they existed, but didn't know which they were.
Donal Fellows
The use of underscore (_) is smart as it is not part of any vocabulary but for a machine considered part of the alphabet. Considering the translation to chinese works it will probably work for most other languages. I'll give it a try!
Roalt
My idea for using google translate is to have a good 'first shot' translation. Native translator will then have an easier time translating the list by hand for a better linguistic.
Roalt
A: 

I would recommend translating each part of the string individually, and then adding the c tokens. You may get less accurate translation, but that's the risk in using automated translators.

And there are always beta testers :)

Or better idea: change %d to an arbitrary integer, %s to an arbitrary Latin string that will not get translated by Google (using a rare family name usually do the trick), %d to an arbitrary number, etc.

liorda
Reminds me of what we did 20 years ago in our company. Imagine a string like "Error reading line %d of file %s, character %c is invalid". You can't seriously translate all the parts "Error reading line", "of file", ", character" and "is invalid" to another language. This will only generate rubbish. You don't even want to give this to your beta-testers.
Patrick
+1  A: 

Have you restructured your program to use the msgcat package to handle the strings yet? The documentation for it covers most of the salient points, including how to handle varying order of replacement. The only vaguely tricky bit is that you'll need to handle the way that % symbols get moved around; if the amount of text being processed is small enough, you could even do that by hand or with a little mechanical assistance (vi, emacs and eclipse can all do the sort of match/replace required; other editors probably can too, but I don't use those).

Donal Fellows
Yes, I use the msgcat package to do the translation. At this moment, the number of string that actually have multiple %s instances in a sentence are limited and for my initial single translation from English to Dutch I did not encounter problems (yet). Maybe with French I will run into them though!
Roalt