views:

43

answers:

1

I basically am working with an oldschool php cms based site in Russian, one of the many new functionalities requested is permalinks.

As of now, currently the website just uses the standard non-mvc 'article.php?id=50'. I was browsing the Russian wiki and this was really the only Russian site I've seen that made use of native Russian permalinks. I'm wondering:

  1. Are there any kind of limitations in regards to character usage? Does this require any type of special setup on the server-side or anything?
  2. What kind of characters should I look out for in general for permalinks? Any gotchas I need?
  3. Any tips on how I should store the permalinks in my database? As of now, the table structure is relatively simple.. just an articles table with:

id article_title article_snippet article_whole date_time

I was thinking of adding a new column in this table named 'permalink' which will basically store a modified version of the article_title ( so far the only character I can think of with special treatment is the space which I'll convert to an underscore ).

  1. How should I have my new clean urls formatted? I was thinking something like:

/articles/2009/Заглавная_страница

for example.

By the way, I'll be using Pylons ( a python framework ) and MySQL 5 though I'm open to PostgreSQL if there are any weird UTF8 restrictions ( I converted the whole database which was previously Latin1 to UTF8 by the way with iconv ).

+2  A: 

The current convention is to encode URLs in UTF-8, and then URL-escape (i.e. %-escape) them:

py> urllib.quote(u"articles/2009/Заглавная_страница".encode("utf-8"))
'articles/2009/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0'

After this, there won't be any restrictions - i.e. browsers will either recognize it as UTF-8 or not, but they will certainly be able to follow the link.

Martin v. Löwis
Interesting technique.
meder