views:

217

answers:

5

Most part of the world uses non-ASCII characters. But some idioms use things like é, ö, á, ã, õ etc, which can be "converted" to ascii.

Suppose the title of the post is:

Configuração é fácil!

How to represent that in a URL?

www.myblog.com/post/1200/Configura__o-_-f_cil

A much better representantion is

www.myblog.com/post/1200/Configuracao-e-facil

Wikipedia do that as in http://en.wikipedia.org/wiki/Deja_vu

Will this improve page rank in search engines?

How to do that in your favorite language?

A: 

In Perl

Use Text::Unidecode:

#!/usr/bin/perl -w

use utf8;
use Text::Unidecode;
print unidecode(
    "áéíóú\n"
);

# That prints: aeiou
motobói
A: 

So what are you going to do about Chinese characters? Or Japanese kana? Or the German scharfes S (ß)? I think you need to think about these things before implementing this feature.

DrJokepu
Maybe one can just ignore a non ascii convertable character.
motobói
'Scharfes S' (ß) is generally decomposed into two s's: Straße => Strasse
Markus Schnell
A: 

Wikipedia might be a good place to look for examples of what you're trying to do.

http://en.wikipedia.org/wiki/Deja_vu

http://en.wikipedia.org/wiki/Straßenbahn
robmandu
+1  A: 

The problem with transliteration is that you might loose or change the meaning of the words.

Take for example the german words Buße (engl. penance) and Busse (engl. busses) or Maße (engl. measures, dimensions) and Masse (engl. mass).

Gumbo