views:

253

answers:

4

I'm working on a web project that will (hopefully) be available in several languages one day (I say "hopefully" because while we only have an English language site planned today, other products of my company are multilingual and I am hoping we are successful enough to need that too).

I understand that the best practice (I'm using Java, Spring MVC, and Velocity here) is to put all text that the user will see in external files, and refer to them in the UI files by name, such as:

#in messages_en.properties:
welcome.header = Welcome to AppName!

#in the markup
<title>#springMessage("welcome.header")</title>

But, having never had to go through this process on a project myself before, I'm curious what the best way to deal with this is when you have some segments of the UI that are heavy on markup, such as:

<p>We are excited to announce that Company1 has been acquired by
<a href="http://www.companydivisionx.com" class="boldLink">Division X</a>,
a fast-growing division of <a href="http://www.company2.com" class="boldLink">Company 2</a>, Inc. 
(Nasdaq: <a href="http://finance.google.com/finance?q=blah" class="boldLink">BLAH</a>), based in...

One option I can think of would be to store this "low-level" of markup in messages.properties itself for the message - but this seems like the worst possible option.

Other options that I can think of are:

  • Store each non-markup inner fragment in messages.properties, such as acquisitionAnnounce1, acquisitionAnnounce2, acquisitionAnnounce3. This seems very tedious though.
  • Break this message into more reusable components, such as Company1.name, Company2.name, Company2.ticker, etc., as each of these is likely reused in many other messages. This would probably account for 80% of the words in this particular message.

Are there any best practices for dealing with internationalizing text that is heavy with markup such as this? Do you just have to bite down and bear the pain of breaking up every piece of text? What is the best solution from any projects you've personally dealt with?

+2  A: 

First off, don't split up your strings. This makes it much harder for localizers to translate text because they can't see the entire string to translate.

I would probably try to use placeholders around the links:

<a href="%link1%" class="%link1class%">Division X</a>

That's how I did it when I was localizing a site into 30 languages. It's not perfect, but it works.

I don't think it's possible (or easy) to remove all markup from strings, you need to have a way to insert the urls and any extra markup.

Ryan Doherty
+5  A: 

Typically if you use a template engine such as Sitemesh or Velocity you can manage these smaller HTML building blocks as subtemplates more effectively.

By so doing, you can incrementally boil down the strings which are the purely internationalized ones into groups and make them relevant to those markup subtemplates. Having done this sort of work using templates for an app which spanned multi-languages in the same locale, as well as multiple locales, we never ever placed markup in our message bundles.

I'd suggest that a key good practice would be to avoid placing markup (even at a low-level as you put it) inside message properties files at all costs! The potential this has for unleashing hell is not something to be overlooked - biting the bullet and breaking things up correctly, is far less of a pain than having to manage many files with scattered HTML markup. Its important you can visualise markup as holistic chunks and scattering that everywhere would make everyday development a chore since:

  • You would lose IDE color highlighting and syntax validation
  • High possibility that one locale file or another can easily be missed when changes to designs / markup filter down

Breaking things down (to a realistic point, eg logical sentence structures but no finer) is somewhat hard work upfront but worth the effort.

Regarding string breakdown granularity, here's a sample of what we did:

 comment.atom-details=Subscribe To Comments
 comment.username-mandatory=You must supply your name
 comment.useremail-mandatory=You must supply your email address 
 comment.email.notification=Dear {0}, the comment thread you are watching has been updated.
 comment.feed.title=Comments on {0}
 comment.feed.title.default=Comments
 comment.feed.entry.title=Comment on {0} at {1,date,medium} {2,time,HH:mm} by {3}


 comment.atom-details=Suscribir a Comentarios
 comment.username-mandatory=Debes indicar tu nombre
 comment.useremail-mandatory=Debes indicar tu direcci\u00f3n de correo electr\u00f3nico
 comment.email.notification=La conversaci\u00f3n que estas viendo ha sido actualizada
 comment.feed.title=Comentarios sobre {0}
 comment.feed.title.default=Comentarios
 comment.feed.entry.title=Comentarios sobre {0} a {1,date,medium} {2,time,HH:mm} por {3}

So you can do interesting things with how you string replace in the message bundle which may also help you preserve it's logical meaning but allow you to manipulate it mid sentence.

j pimmel
Not sure if I fully understand the benefit of managing these smaller building blocks as subtemplates. Won't all of these strings from subtemplates still go into the same messages.properties?
matt b
Yep, you still will have lots of strings - i guess my point is you can't really avoid it. One strategy to prevent high volumes of message strings would be to place message bundle strings in the database where its sensible to do so.
j pimmel
Looking at our message bundles, we have 1230 lines of strings.. That's all pure site text that couldn't otherwise be surfaced from the DB.
j pimmel
A: 

You should avoid breaking up your strings. Not only does this become a nightmare to translate, but it also makes grammatical assumptions which may not be correct in the target language.

While placeholders can be helpful for many things, I would not recommend using placeholders for URLs. This allows you to customize the URL for different locales. After all, no sense sending them to an English language page when their locale is Argentine Spanish!

Robert J. Walker
+2  A: 

As others have said, please never split the strings into segments. You will cause translators grief as they have to coerce their language syntax to the ad-hoc rules you inadvertently create. Often it will not be possible to provide a grammatically correct translation, especially if you reuse certain segments in different contexts.

Do not remove the markup, either.

Please do not assume professional translators work in Notepad :) Computer-aided translation (CAT) tools, such as the Trados suite, know about markup perfectly well. If the tagging is HTML, rather than some custom XML format, no special preparation is required. Trados will protect the tags from accidental modification, while still allowing changes where necessary. Note that certain elements of tags often need to be localized, e.g. alt text or some query strings, so just stripping all the markup won't do.

Best of all, unless you're working on a zero-budget personal project, consider contacting a localization vendor. Localization is a service just like web design. A competent vendor will help you pick the optimal solution/format for your project and guide you through the preparation of the source material and incorporating the localized result. And of course they and their translators will have all the necessary tools. (Full disclosure: I am a translator / localization specialist. And don't split up strings :)

moodforaday
In this strategy, what do you do when you need to change the markup that is duplicated across dozens of language files? Edit each individually? Sounds like you are mixing concerns here - internationalizing messages and the choice of HTML/markup in the UI.
matt b
From my practical experience, changes in markup often cascade into changes of localizeable text. I cannot count the number of projects sent for translation because of a minor change in markup. Semantic changes in markup may require translation update; (contd.)
moodforaday
(contd.) often though the translation itself will not change. Note that if you work with a pro localization vendor who uses proper tools, charges for such updates will be minimal, since CAT tools will do most of the job here, and translatable wordcount will be minimal. (contd.)
moodforaday
(contd.) The question in any specific case is, does the change in markup require updating the translation.
moodforaday