views:

495

answers:

4

I have to create a software that must work on several *nix platforms (Linux, AIX, ...).

I need to handle internationalization and my translation strings are in the following form:

"Hi %1, you are %2." // English
"Vous êtes %2, bonjour %1 !" // French

Here %1 stand for the name, and %2 for another word. I may change the format, that's not an issue.

I tried to use printf() but you cannot specify the order of the parameters, you just specify their types.

"Hi %s, you are %s"
"Vous êtes %s, bonjour %s !"

Now there is no way to know which parameter to use for replacement of %s: printf() just uses the first one, then the next.

Is there any alternative to printf() that deals with this ?

Note: gettext() is not an option.

+11  A: 

boost.format supports this the way like in python however this is for C++

Vinzenz
boost::format is the best way because it is also typesafe. The implementation is not optimized, however, and calls through to printf under the covers (or did last time I looked) so is slower that using C printf directly by a factor of 2-3.
dajames
+20  A: 

POSIX printf() supports positional arguments.

printf("Hi %1$s, you are %2$s.", name, status);
printf("Vous êtes %2$s, bonjour %1$s !", name, status);
Ignacio Vazquez-Abrams
Cool! I had no idea! Any reference?
Amigable Clark Kant
@Amigable: `man 3p printf`
Ignacio Vazquez-Abrams
Oh. But I must say I don't understand the manpage completely. How can printf("%*d", width, num); and printf("%2$*1$d", width, num); be equivalent if arguments are numbered left to right?
Amigable Clark Kant
@Amigable, it does make sense if you think about it. This is _one_ thing being printed. `%2$*1$d` breaks down into `(%2$(*1$)d)` where the inner parenthesised bit specifies that param1 is to be used as the width for the parameter given by param2. It's equivalent to `%*d` breaking down into `(%(*)d)` with sequential assigning of param1 and param2.
paxdiablo
+8  A: 

You want the %n$s extension that is common to most Unix systems.

"Hi %1$s, you are %2$s."

See the German example at the bottom printf

Regards DaveF

David Allan Finch
wow - post several second later is the difference between 6 up votes and 0.
David Allan Finch
@David Allan Finch: I upvoted for fairness. But it happens to me all the time ;)
ereOn
@ereOn: not a whinge really, just shock that first post even by a few seconds matters so much to getting now 11 vs 2 for the same answer.
David Allan Finch
It also a matter of reputation I believe. I was much more upvoted a soon as I got my gold badge and 1000+ reputation.
ereOn
and what is essentially a comment (interesting, but nevertheless a comment gets a nice +9 ...)
shodanex
Indeed, the world is not fair. So I upvoted to try and make it more so. :) Though I do think paxdiablo had the more important answer ("Don't do this, it's not really what you want.")
Hostile Fork
I think there is no general solution that can cover all languages.
David Allan Finch
Thanks "Hostile Fork", yep I know the world is unfair and I know my answer is no more or less deserving, just was shocked by how a few seconds and may be 2 orders more rep makes all that difference for saying the same thing ;)
David Allan Finch
+16  A: 

I don't mean to be the bearer of bad tidings but what you're proposing is actually a bad idea. I work for a company that take i18n very seriously and we've discovered (painfully) that you cannot just slot words into sentences like that, since they often make no sense.

What we do is to simply disconnect the error text from the variable bits altogether, so as to avoid these problems. For, example, we'll generate an error:

XYZ-E-1002 Frobozz not configured for multiple zorkmids (F22, 7).

And then, in the description of the error, you state simply that the two values in the parentheses at the end were the Frobozz identifier and the number of zorkmids you tried to inflict on it.

This leaves i18n translation as an incredibly easy task since you have, at translation time, all of the language elements you need without worrying whether the variable bits should be singular or plural, masculine or feminine, first, second, or third declension (whatever the heck that actually means).

The translation team simply has to convert "Frobozz not configured for multiple zorkmids" and that's a lot easier.


For those who would like to see a concrete example, I have something back from our translation bods (with enough stuff changed to protect the guilty).

At some point, someone submitted the following:

The {name} {object} is invalid

where {name} was the name of a object (customers, orders, etc) and {object} was the object type itself (table, file, document, stored procedure, etc).

Simple enough for English, the primary (probably only) language of the developers, but they struck a problem when translating to German/Swiss-German.

While the "customers document" translated correctly (in a positional sense) to Kundendokument, the fact that the format string had a space between the two words was an issue. That was basically because the developers were trying to get the sentence to sound more natural but, unfortunately, only more natural based on their limited experience.

A bigger problem was with the "customers stored procedure" which became gespeichertes Verfahren der Kunden, literally "stored procedure of the customers". While the German customers may have put up with a space in Kunden dokument, there is no way to impose gespeichertes Verfahren der Kunden onto {name} {object} successfully.

Now you may say that a cleverer format string would have fixed this but there are several reasons why that would be incorrect:

  • this is a very simple example, there are likely to be others more complex (I'd try get some examples but our translation bods have made it clear they have more pressing work than to submit themselves to my every whim).
  • the whole point of the format strings is to externalise translation. If the format strings themselves are specific to the translation target, you've gained very little by externalising the text.
  • developers should not have to concern themselves with format strings like {possible-pre-adjectives} {possible-pre-owner} {object} {possible-post-adjectives} {possible-post-owner} {possible-postowner-adjectives}. That is the job of the translation teams since they understand the nuances.

Note that introducing the disconnect sidesteps this issue nicely:

The object specified by <parameter 1>, of type <parameter 2>, is invalid.
    Parameter 1 = {name}.
    Parameter 2 = {object}.
Der sache nannte <parameter 1>, dessen art <parameter 2> ist, ist falsch. 
    Parameter 1 = {name}.
    Parameter 2 = {object}.

That last translation was one of mine, please don't use it to impugn the quality of our translators. No doubt more fluent German speakers will get a good laugh out of it.

paxdiablo
+1 for the good advices. I cannot decide here. I will however try to explain them your point.
ereOn
@paxdiablo: I would be interested in a real-world example since I can’t think of a case where this wouldn’t work (given meaningful words/identifiers to slot into the sentence). I’ve actually written software this way so if you can point out caveats of the method, please do.
Konrad Rudolph
@Konrad Rudolph: Any situation where a grammatical object splits in two in one language but not in another. Consider for example `printf("I %s know", iKnow ? "" : "don't");` The French equivalents to "I know" and "I don't know" are "Je sais" and "Je ne sais pas". The negative won't fit the template that works fine for English.
JeremyP
@JeremyP: But that's more a result of sloppiness and why people are disparaged from injecting words that way instead of using complete sentences/phrases.
Ignacio Vazquez-Abrams
@pax: And how do non-technical users respond to messages like, "I'm sorry, we're out of stock for one of the items you ordered. The item, and the time it should be available by, are at the end of this message. (Widget23, Friday)"? ;-) Not saying your idea is a bad one, just that an error message directed at people who actually read manuals isn't a particularly difficult case, as i18n goes.
Steve Jessop
@Jeremy: Like Ignacio I don’t consider this a valid case. The word(s) “don’t” can’t be injected into a localized text at all, since it’s not localized itself. As far as I’m concerned, only single real words in the lexical sense may ever be inserted, which means that such a problem won’t arise.
Konrad Rudolph
@Konrad: A more common issue is that a naive English-speaker would create the template `"%d %s %s hanging on the wall"`, with expected values including green/blue/pink and bottle(s)/elephant(s). But you discover that in order to localise the second argument, you need to know the gender of the third argument. Both are "single real words", but they're not sufficiently independent to be inserted separately, and even in English the word to insert depends on the value for `%d`. Sometimes you can only insert one "thing" into a sentence, and translation needs to be smarter than `printf`.
Steve Jessop
@pax: declension means the way that nouns alter according to the case they appear in (which often depends on some preposition). There are few examples left in English, but pronouns still decline: there's I/me, she/her, and if you're being formal, "who" in the accusative case becomes "whom" - "Whom should I contact?". Latin has 6 cases and 5 "declensions". That is, there are 5 different groups a noun can belong to, and once you know the noun's stem, and its declension, and its gender, that tells you how the noun transforms in each of the 6 cases. `printf` doesn't handle this well ;-)
Steve Jessop
Steve, on top of gender issues, there's the ordering problem. The English "the red table" is the French "la table rouge". An English-speaking developer shouldn't need to know all the nuances of the twenty-odd locales we have to translate for. That's why it's done the way it is in our shop (thanks for the education in your last comment by the way, I've always wondered about that but we didn't get to do much Latin in a country school). @Konrad, I'll ask our Japanese/German translation bods for a specific example tomorrow when I'm back at work
paxdiablo
@pax: sure, but order is the one problem that `printf` does solve by itself, if you use a version with positional arguments. Everything else you mentioned (plural, gender, declension) requires actual logic in order to calculate the correct sentence from the variable arguments provided. As soon as the logic varies by language, printf-based localisation only gets you so far.
Steve Jessop
@pax: thanks a lot.
Konrad Rudolph
@Konrad Rudolph: It's just an example to show you the possible pitfalls. Also, in reality, you would localise whatever you are injecting. Also if you are assuming that whatever you can substitute as a single word in your native language can be substituted as a single word in any other language, you are wrong.
JeremyP
@Jeremy: Notice that I explicitly defined “words” as in the lexical sense. By this I mean atomic units regardless of language. So for example, a *file path* will always be the same unit, so having a localized string “`File %s was not found.`” should always be safe. Likewise for entities in the business context of the program (e.g. object names): “`Are you sure that you want to move %s to the trash?`”. Perhaps I should have been more explicit (“lexical” was a lousy term to use) but my question basically is: can *these* kinds of texts really make problems? If so, I’d really like to see examples.
Konrad Rudolph
@Konrad Rudolph: Actually you said "given meaningful words/identifiers". You didn't restrict the problem domain to simple things like just inserting a file path. Of course, if you do, then this kind of substitution does work.
JeremyP
You do realize that "scheiße" is an expletive on the level of "fuck" and "shit", do you?
Sebastian N.
Closer to the latter than the former (I consider those two swear words at different "levels" of profanity). It was hard to find a German word that meant "crappy" so I had to make do. In any case, I've changed it in case anyone's offended.
paxdiablo