After researching a bit how the different way people slugify titles, I've noticed that it's often missing how to deal with non english titles.
url encoding is very restrictive. See http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
So, for example how do folks deal with for title slugs for things like
"Una lágrima cayó en la arena"
One can come up with a reasonable table for indo european languages, ie. things that can be encoded via ISO-8859-1. For example, a conversion table would translate 'á' => 'a', so the slug would be
"una-lagrima-cayo-en-la-arena"
However, I'm using unicode (in particular using UTF-8 encoding), so no guaranties about what sort code points I'm going to get (I have to prepare for things that can't be ISO-8859-1 encoded.
I a nushell. How do deal with this? Should I come up with a conversion table for chars in the ISO_8859-1 range (<255) and drop everything else?
EDIT: To give a bit more context, a priori, I don't really expect to slugify data in non indo european languages, but I'd like to have a plan if I encounter such data. A conversion table for the extended ASCII would be nice. Any pointers?
Also, since people are asking, I'm using python, running on Google App Engine