views:

547

answers:

6

I am going to work on a project where a fairly large web app needs to tweaked to handle several languages. The thing runs with a hand crafted PHP code but it's pretty clean.

I was wondering what would be the best way to do that?

  1. Making something on my own, trying to fit the actual architecture.

  2. Rewriting a good part of it using a framework (e.g. symfony) that will manage i18n that for me?

If 1., where to store the i18n data? *.po, xliff, pure DB?

I thought about an alternative: using only symfony for the translation but the controller would only load the website as it's already is. Quick, but dirty. On the other hand, it allows us to make the next modif, moving slowly to full symfony: this web site is really a good candidate.

But maybe there are some standalone translation engines that would do the job better that an entire web framwork. It's a bit like sing a bazooka to kill a fly...

+1  A: 

Work with languages files.

  1. Replace each test by a variable
  2. Create one language file per language and in it define each variable with their corresponding text. (french.inc, dutch.inc ...)
  3. Include the right file in each page.

That's for small sites.

If getting bigger, replace the files by a DB. :)

Veynom
A: 

You could look at Zend_Translate, it's a pretty comprehensive, well documented and overall code quality is great. It also allows you to use a unified API for gettext, csv, db, ini file, array or whatever you end up saving your translated strings in.

Also, look at/watch this thread: What are good tools/frameworks for i18n of a php codebase?. It seems similar to your question.

Till
A: 

If it's multi-byte character support then it might be worth checking out the multibyte string functions in PHP:

http://uk.php.net/manual/en/book.mbstring.php

These will better handle multi-byte characters.

Rick Curran
A: 

I use hl parameter and gettext combining engine translations already there with own .po which makes new translations and languages appear when engine or my django/gae example adds:

{% get_current_language as LANGUAGE_CODE %}{{ LANGUAGE_CODE }}{% get_available_languages as LANGUAGES %}{% for LANGUAGE in LANGUAGES %}{% ifnotequal LANGUAGE_CODE LANGUAGE.0 %}{{ LANGUAGE.0 }}{% endifnotequal %}{% endfor %}

So keeping from duplicates and fully using translations already there lets forth here the missing eg arabic month names to appear directly either when engine team adds or app

LarsOn
+6  A: 

There are a number of ways of tackling this. None of them "the best way" and all of them with problems in the short term or the long term. The very first thing to say is that multi lingual sites are not easy, translators and lovely people but hard to work with and most programmers see the problem as a technical one only. There is also another dimension, outside the scope of this answer, as to whether you are translating or localising. This involves looking at the target audiences cultural mores and then tailoring language, style, layout, colour, typeface etc., to that culture. Finally do not use MT, Machine Translation, for anything serious or if it needs to be accurate and when acquiring translators ensure that they are translating from a foreign language into their native language which means that they understand all the nuances of the target language.

Right. Solutions. On the basis that you do not want to rewrite the site then simply clone the site you have and translate the copies to the target language. Assuming the code base is stable you can use a VCS to manage any code changes. You can tweak individual parts of the site to fit the target language, for example French text is on average 30% larger than the equivalent English text so using one site to deliver this means you may (will) have formatting problems and need to swap a different css file in and out depending on the language. It might seem a clunky way to do it but then how long are the sites going to exist? The management overhead of doing it this way may well be less than other options.

Second way without rebuilding. Replace all content in the current site with tags and then put the different language in file or db tables, sniff the users desired language (do you have registered users who can make a preference or do you want to get the browser language tag, or is it going to be URL dot-com dot-fr, dot-de that make the choice) and then replace the tags with the target language. Then you need to address the sizing issues and the image issues separately. This solution is in effect when frameworks like Symfony and Zend do to implement l10n.

Then you could rebuild with a framework or with gettext and and possibly have a cleaner solution but remember frameworks were designed to solve other problems, not translation and the translation component has come into the framework as partial solution not the full one.

The big problem with all the solutions is ongoing maintenance. Because not not only do you have a code base but also multiple language bases to maintain. Unless you all in one solution is really clever and effective then to ongoing task will be difficult.

PurplePilot
+3  A: 

It is important to notice that there are two steps involved before translating:

  1. Internationalization: that is, enabling your site to handle multiple languages
  2. Localization: this includes translating your texts (obtained in step 1) to each language you plan to support

See more on this in Wikipedia.

Step 1 would require you to take into account the fact that some languages are written right to left (RTL) and non-european characters such as Japanese or Chinese. If you are not planning to handle these languages and characters it might be simpler.

For this type of situation I would prefer to have a language file (actually as many language files as languages I plan to support, naming each as langcode.php as in en.php or fr.php) with an associative array containing all the texts used in the site. The procedure would go as follows:

  1. Scan your site for every single text that should be localized
  2. For each page/section I would create a $lang['sectionname'][] array
  3. For each text I would create a $lang['sectionname']['textname'] entry
  4. I would create a Lang.php class that would receive a lang parameter upon instantiation but would have a default in case no lang is received (this method loads langcode.php depending on the parameter or a default depending on your preferred language)
  5. The class would have a setPage() method that would receive the page/section you will be displaying
  6. The class would have a show() method that would receive the text to be displayed (show() would be called as many times as texts are shown in a given page... show() being a kind of wrapper for echo $lang['mypage']['mytext'])

This way you could have as many languages as you want in a very easy way. You could even have a language admin where you open your base language page (you actually just read recursively the arrays and display them in textareas) and can then "Save as..." some other language.

I use a similar approach in my site. It is only one page though but I have made multi-page sites with this idea.

If you have user-submitted content or some rather complicated CMS it would be a different story. You could look for i18n-friendly frameworks (Drupal comes to mind).

mga
Localization is not translating the text. Localization is rather adapting to the locales, currency, culture, audience, etc.
Gumbo
you're right in that translation is a part of localization: "Localization is the process of adapting internationalized software for a specific region or language by adding locale-specific components and translating text." (Wikipedia)
mga
edited to reflect that translating is part of l10n
mga