views:

174

answers:

6

I'm looking for a culturally-sensitive way to properly insert a noun into a sentence while using the appropriate article (a/an). It could use String.Format, or possibly something else if the appropriate way to do this exists elsewhere.

For example:

Base Sentence: "You are looking at a/an {0}"

This should format to: "You are looking at a carrot" or "You are looking at an egg."

I'm currently doing this by manually checking the first character of the word to be inserted and then manually inserting "a" or "an." But I'm concerned that this might limit me when the application is localized to other languages.

Is there a best practice for approaching this problem?

RESOLUTION: It appears that the problem is complicated to the point that there does not exist a utility or framework to solve this problem in the way I originally phrased. It appears that the best solution (in my situation) is to store the article in the database along with the noun so that the translators can have the level of control they need. Thanks for all of the suggestions!

+7  A: 

The problem is even in English, a-vs-an is not determined by the starting letter, but the starting sound. You can make a pretty good assumption in English based on the starting letter, but there are some exceptions (e.g. "hour", "user").

The best thing to do is have access to a word's pronunciation to be able to choose.

Barring that, the next best thing to do is have a list of common exceptions, then guess the rest.


From Wikipedia:

The choice of "a" or "an" is determined by phonetic rules rather than by spelling convention. "An" is employed in speech to remove the awkward glottal stop (momentary silent pause) that is otherwise required between "a" and a following word e.g. "an X-ray" ... The following paragraphs are spelling rules for "an" which can be used if the phonetic rule is not understood.

lc
+4  A: 

Besides the problem noted by lc (an hour, a hat) the grammar rules in different languages vary widely. For instance, many Latin based languages switch articles of nouns based on the 'gender' and 'number' of the noun, which can sometimes be inferred from the last few characters of the word but has the same problem as English...there are many exceptions.

If you are talking about localizing an interface I would store the article with the noun for the interface element in each language. If you are processing user input I don't see an easy way to do this.

Gary.Ray
A: 

There is a library called .NET inflector which is used for pluralization/singularization that could potentially be repurposed or extended in the same manner for such a thing. What you are looking for here is a general rule that you can then tweak with exceptions... for example, words that start with letter combinations that are generally pronounced as vowels. Although this is not [strictly speaking] culturally sensitive, it should get you started in English at least.

BenAlabaster
+2  A: 

Other languages have similar problems but in different contexts, for example depending on the gender of objects or the actual number of objects (And some languages do not only differentiate between one and two objects, but the grammar is different for one, two or more than two objects in Russian).

To use an example from the German language:

The equivalent of "You are looking at a(n) X" would be:

"Du siehst ein X"
or "Du siehst eine X"
or "Du siehst einen X"

depending on the gender of the object X. And the gender of an object is something you cannot guess, it is linked to the name of the object just based on historical tradition.

So, there is no simple way to implement such distinctions in most languages. My advise would be to always use formulations that will work in all cases, e.g.

"You are looking at an object of type X"
or "You are looking at a(n) X"
or "You are looking at one X"

NineBerry
+1  A: 

How is this culturally sensitive? What you're trying to do only makes sense in English. Other languages may require completely different changes to the sentence. Some may modify the noun, some may change the sentence structure, some may reorder the words and so on.

What you're trying to do will break when localized to other languages. There is no automatic way to solve this.

As a simple example, consider what happens when you try to write "the {something}" In English, you simply prefix with word with "the" and all is well. In French, you prefix it with le, la or les, depending on gender and whether or not it's plural. In Danish, you instead add a 'en' or 'et' suffix. So where the word table translates to "bord", "the table" becomes "bordet". And "chair" (stol) becomes "stolen".

The only meaningful way around this is to give the translator control over the entire sentence. Don't assume you can just plug in a few culture-dependant words here and there.

So in practice, the best solution may be to accept that your application can't be trivially localized. Perhaps you can ensure that it's possible for a few key languages, and other than that, ignore the problem. Then, if you need to translate the application to language X at some point in the future, work with the translator to make the changes that are necessary in the application code.

Alternatively, you'd have to completely give up on trying to construct sentences yourself. The only way to ensure that a sentence is correct is to have a translator write the whole sentence. And of course, that causes problems when you want to be able to swap individual words.

jalf
Well, it would make sense in Swedish. And since one can never know whether it makes sense in *any* language, it's not a completely ridiculous approach. But I agree that the translation should be on a per-phrase basis, not a per-noun. Some languages may not even have nouns, right?
bzlm
I'm quite aware that this approach is limited to English (and as others have pointed out it doesn't even solve all English scenarios). That's why I opened this question
DivisionByZorro
It is a ridiculous approach if you expect it to *work*. He asked for a culture-sensitive approach, and in that context, it is ridiculous. In a more limited context of "it works in English, and I'm happy if I can translate it to, say, German, Spanish and Russian as well" it can work, but then it no longer respects culture *in general*
jalf
@DivisionByZorro: The problem is that no solution *exists* that works for all languages (other than "let the translator translate the entire sentence"). It's not just that your suggested solution won't work, but that the problem is completely different in different languages. The entire approach is flawed if you want localization to work *in general*.
jalf
+3  A: 

In the days of old text adventure computer games, one way to solve this was to make "a", "an", "the" etc part of the actual name of the thing. When in these games you see "a mailbox", there is no smart AI to determine the "a". The name of the object is "a mailbox". In some games it could say "you open a mailbox", which sounds dumb, and in others it could say "you open the mailbox", which means there was another name of the object entered, to be used in different grammatic contexts. I suggest you go this route instead of the route you're on now.

bzlm