views:

239

answers:

6

Hello guys,

i was wondering how you deal with permalinks on international sites. By permalink i mean some link which is unique and human readable.

E.g. for english phrases its no problem e.g. /product/some-title/

but what do you do if the product title is in e.g chinese language?? how do you deal with this problem?

i am implementing an international site and one requirement is to have human readable URLs. Thanks for every commen

A: 

If memory serves, you're only able to use English letters in URLs. There's a discussion to change that, but I'm fairly positive that it's not been implemented yet.

that said, you'd need to have a look up table where you assign translations of products/titles into whatever word that they'll be in the other language. For example:

foo.com/cat will need a translation look up for "cat" "gato" "neko" etc.

Then your HTTP module which is parsing those human reading objects into an exact url will know which page to serve based upon the translations.

Stephen Wrighton
A: 

Creating a look up for such thing seems an overflow to me. I cannot create a lookup for all the different words in all languages. Maybe accessing an translation API would be a good idea.

So as far as I can see its not possible to use foreign chars in the permalink as the sepecs of the URL does not allow it.

What do you think of encoding the specials chars? are those URLs recognized by Google then?

Michal
Stephen Wrighton
+2  A: 

Characters outside the ISO Latin-1 set are not permitted in URLs according to this spec, so Chinese strings would be out immediately.

Where the product name can be localised, you can use urls like <DOMAIN>/<LANGUAGE>/DIR/<PRODUCT_TRANSLATED>, e.g.:

http://www.example.com/en/products/cat/
http://www.example.com/fr/products/chat/

accompanied by a mod_rewrite rule to the effect of:

RewriteRule ^([a-z]+)/product/([a-z]+)? product_lookup.php?lang=$1&product=$2

For the first example above, this rule will call product_lookup.php?lang=en&product=cat. Inside this script is where you would access the internal translation engine (from the lang parameter, en in this case) to do the same translation you do on the user-facing side to translate, say, "Chat" on the French page, "Cat" on the English, etc.

Using an external translation API would be a good idea, but tricky to get a reliable one which works correctly in your business domain. Google have opened up a translation API, but it currently only supports a limited number of languages.

  • English <=> Arabic
  • English <=> Chinese
  • English <=> Russian
ConroyP
Osama ALASSIRY
+1  A: 

How about some scheme like /productid/{product-id-number}/some-title/ where the site looks at the {number} and ignores the 'some-title' part entirely. You can put that into whatever language or encoding you like, because it's not being used.

Alister Bulman
Thats a really good and intereseting idea. This points to my problem.1. the url is illegal then, isnt it?lets say: /product/122/prodöktä/2. does google recognize those urls then?
Michal
It looks to be quite possible - http://dmoz.org.il/ - though you may do better encoding them as UTF8 and seeing if they are decoded in the browser. Google et al would also likely do better with them.
Alister Bulman
I think this is how i will do it. I will remove all the special chars like e.g. /$"', etc from the title and bung the title straight into the link. But when requesting the querystring i will ignore the title and only use the ID.
Michal
Make you sure confirm the title portion of the URL on each page load, so people don't link to http://happystuff.com/3/child-porn-gangbang-rape/. You also don't want duplicate content :P
orlandu63
+2  A: 

I usually transliterate the non-ascii characters. For example "täst" would become "taest". GNU iconv can do this for you (I'm sure there are other libraries):

$ echo täst | iconv -t 'ascii//translit'
taest

Alas, these transliterations are locale dependent: in languages other than german, 'ä' could be translitertated as simply 'a', for example. But on the other side, there should be a transliteration for every (commonly used) character set into ASCII.

hop
+2  A: 
Maxim
Michal