views:

40

answers:

4

Let's say you are building a multilingual web application in which all interface text should be moved to language-dependent resources and loaded when needed. The string resource may be huge: say you have several thousands of strings translated. In windowed environments (Windows, OS X, X11) you usually have a mechanism provided by the OS or some API for doing just that and they are usually called string resources. What about PHP then?

Remember though, performance must be considered seriously here as PHP compiles and executes all your modules with each user request.

I can think of several possible ways of doing it. But first of all, I'm going to have a global variable $LANGUAGE which may be set to 'en', 'de', 'fr' etc. I'll use this variable to include a language-specific module with each request as

require_once "lang-$LANGUAGE.inc.php"

So some of the possible solutions include:

(1) Defining all strings as global vars in each language module, e.g.

$str_signin = 'Sign in';
$str_welcome_user = 'Welcome, %s'!;
...

Very simple, easy to read and relatively easy to work on for non-technical people (translators, that is). There is some global space pollution though which will slow down your global variable lookup a bit.

(2) Same but defined as one huge array, e.g.

$str['signin'] = 'Sign in';
$str['welcome_user'] = 'Welcome, %s'!;
...

Less readable, a bit less usable in your main code (more typing involved) also clutters your code a bit more. This would be slower because these are not simple assignments but assoc. array assignments: there will be more instructions to execute here for the VM compared to (1).

(3) PHP 5.3+: define as constants, possibly in a class or namespace

class str {
    const signin = 'Sign in';
    const welcome_user = 'Welcome, %s'!;
    const signin_to_a = self::signin . ' to area A'; // can't do this!
    ...
}

... and use them as str::signin etc. Nice, I like this most of all, although there are a few minor disadvantages as well: PHP 5.3+ only; can't use expressions, only single values (which may or may not be fine in your case); can't use in $-expansion in double-quoted strings (or can you?).

(4) Database: put everything into a table and retrieve by some ID, e.g. str_get(STR_SIGNIN). Ugly, slow, needs syncing of your ID's in the code with the DB ID's, however no need to load everything when all your page needs is just a few strings. Honestly can't say if this is a good solution or not.

Any other suggestions? Also, thoughts on these ones?

And please do keep in mind simplicity, elegancy and performance!

+1  A: 

What about gettext? Alternatively, Zend Framework provides a really reliable interface to work with translations.

mingos
+2  A: 

Zend Framework has a component called Zend_Translate which is really useful and their manual page has a good write up on the different ways you can store strings, even if you decide not to use the ZF component.

PHP is the most performant and the best solution if you're maintaining strings as a developer. If you're working with a translation company it's likely they'll expect to work with CSVs and send these back and forth.

I don't know off the top of my head whether an array or constant based solution is better but my money is on the arrays. A quick benchmark will soon tell you.

David Caunt
Thanks, I did some tests and posted an answer in this thread, in case you are interested.
mojuba
+1  A: 

I use a mysql table to store all language strings. This way I can easily create an user interface to edit them. I get all the language strings related to an object as an associative array (with GetAssoc Function in ADODB). I cache the assoc array with Pear::Cache Class, so next time I will simply retrieve this cached array (no database queries). This had been the best solution for me so far. This can be optimized using different caching techniques.

bkilinc
this method may sound slow as it contains "database access". but you read database only once and then store the results in a cache file. The cache file may be a PHP source containing arrays or constants, anything. In the past I was using language files but I could not give an interface to translators, and I could not locate easly translation strings.
bkilinc
A: 

I did simple benchmark tests for the first three methods. I created modules with 10,000 string assignments and measured separately loading/compilation times and also access times.

What wasn't surprising was that constants were much faster to load and compile compared to globals and the array method. What surprised me though was that constants were significantly slower to access! Possibly because it's a new mechanism in PHP and hasn't been polished yet.

Besides, it turns out the huge array is faster to compile (in fact the array(...) construct is even faster) than the all-globals method, but then array access is more expensive.

Considering you usually load huge amount of locale data and then use only small part of it, I think the most sensible method would be constants. Not surprisingly, class-static constants perform better than global ones.

So I'd go for method 3. I haven't tested the database method, but something suggests reading the entire database table would be roughly the same or more expensive than reading a PHP source of the same volume.

mojuba
I would go for method 2. Const access is slow (as I had suspected) and you can use something like APC to opcode cache your array files. This skips the 'compilation' step. Then you can benefit from fast access.
David Caunt