views:

392

answers:

2

What options are there for localizing an app on Google App Engine? How do you do it using Webapp, Django, web2py or [insert framework here].

1. Readable URLs and entity key names

Readable URLs are good for usability and search engine optimization (Stack Overflow is a good example on how to do it). On Google App Engine, key based queries are recommended for performance reasons. It follows that it is good practice to use the entity key name in the URL, so that the entity can be fetched from the datastore as quickly as possible.

Some characters have special meaning in URLs (&, ", ' etc). To be able to use key names as parts of an URL, they should not contain any of these characters. Currently I use the function below to create key names:

import re
import unicodedata

def urlify(unicode_string):

    """Translates latin1 unicode strings to url friendly ASCII.

    Converts accented latin1 characters to their non-accented ASCII
    counterparts, converts to lowercase, converts spaces to hyphens 
    and removes all characters that are not alphanumeric ASCII.

    Arguments
        unicode_string:     Unicode encoded string.

    Returns
        String consisting of alphanumeric (ASCII) characters and hyphens.
    """

    str = unicodedata.normalize('NFKD', unicode_string).encode('ASCII',
                                                               'ignore')
    str = re.sub('[^\w\s-]', '', str).strip().lower()
    return re.sub('[-\s]+', '-', str)

This is basically a whitelist for approved characters. It works fine for English and Swedish, however it will fail for non-western scripts and remove letters from some western ones (like Norwegian and Danish with their œ and ø).

Can anyone suggest a method that works with more languages? Would it be better to remove problematic characters (blacklist)?

2. Translating templates

Does Django internationalization and localization work on Google App Engine? Are there any extra steps that must be performed? Is it possible to use Django i18n and l10n for Django templates while using Webapp?

The Jinja2 template language provides integration with Babel. How well does this work, in your experience?

What options are avilable for your chosen template language?

3. Translated datastore content

When serving content from (or storing it to) the datastore: Is there a better way than getting the *accept_language* parameter from the HTTP request and matching this with a language property that you have set with each entity?

+1  A: 

Concerning point 2, I asked a similar question a few months ago. I've managed to get the application internationalized, but just the content, not the urls (wasn't planning on doing so either).

I've also added the revision I made to my code so that people can see what changes went into i18n'ing this Google App Engine app. Look at my second comment on the accepted answer.

Good luck with your other 2 points!

Emilien
+1  A: 

Regarding point 1, there's really no need to go to such lengths: Simply use unicode key names. They'll be encoded as UTF-8 in the datastore for you.

Regarding point 3, there are many ways to handle language detection. Certainly accept_language should be part of it, and you'll find webob's accept_language support particularly useful here (hopefully Django or your framework-of-choice has something similar). It's quite often the case, however, that a user's browser's language configuration isn't correct, so you'll want to make sure there's some way for the user to override the detected language - for example, with a link on each page to change the language, setting a preference cookie.

Nick Johnson
+1 for overriding auto-detection.
Colonel Sponsz
I did not know that Internationalized Resource Identifiers (rfc3987) was supported by all major browsers except for IE6. However that is what W3C tells us: http://www.w3.org/International/articles/idn-and-iri/
Petri Pennanen
IRIs have nothing to do with datastore key names.
Nick Johnson
My original question was unclear... I have used key names as the last part of URLs. They were passed as parameters to a webapp request handler. My concern was that internationalized strings would not be passed to the request handler intact (I was unaware of rfc3987).
Petri Pennanen