I rewrite URLs to include the title of user generated travelblogs.
I do this for both readability of URLs and SEO purposes.
http://www.example.com/gallery/280-Gorges_du_Todra/
The first integer is the id, the rest is for us humans (but is irrelevant for requesting the resource).
Now people can write titles containing any UTF-8 character, but most are not allowed in the URL. My audience is generally English speaking, but since they travel, they like to include names like
Aït Ben Haddou
What is the proper way to translate this for displaying in an URL using PHP on linux.
So far I've seen several solutions:
just strip all non allowed characters, replace spaces this has strange results:
'Aït Ben Haddou' → /gallery/280-At_Ben_Haddou/
Not really helpfull.just strip all non allowed characters, replace spaces, leave charcode (stackoverflow.com) most likely because of the 'regex-hammer' used
this gives strange results:'tést tést' → /questions/0000/t233st-t233st
translate to 'nearest equivalent'
'Aït Ben Haddou' → /gallery/280-Ait_Ben_Haddou/
But this goes wrong for german; for example 'ü' should be transliterated 'ue'.
For me, as a Dutch person, the 3rd result 'looks' the best.
I'm quite sure however that (1) many people will have a different opinion and (2) it is just plain wrong in the german example.
Another problem with the 3rd option is: how to find all possible characters that can be converted to a 7bit equivalent?
So the question is:
what, in your opinion, is the most desirable result. (within tech-limits)
How to technically solve it. (reach the desired result) with PHP.