views:

1364

answers:

1

Duplicate of 249087

I have a bunch of user generated addresses that may contain characters with diacritic marks. What is the most effective (i.e. generic) way (apart from a straightforward replace) to automatically convert any such characters to their closest English equivalent?

E.g. any of àâãäå would become a

æ would become the two separate letters ae

ç would become c

any of èéêë would become e

etc. for all possible letter variations (preferably without having to find and encode lookups for each diacritic form of the letter).

(Note: I have to pass these addresses on to third party software that is incapable of printing anything other than English characters. I'd rather the software was capable of handling them, but I have no control over that.)

EDIT: Never mind... Found the answer [here][2]. It showed up in the "Related" section to the right of the question after I posted, but not in my prior search or as a pre-post suggestion. Hmm. I added the 'diacritics' tag to the other question in any case.

EDIT 2: Jeez! Who voted this -1 after I closed it?

+1  A: 

Just was going to post the same link :-)

Sounds like you're doing this already, but I would recommend that you store the original string for display in your application, and only do this for the 3rd-party stuff. People get cranky if they don't think their real name is important :-)

devstuff
Ah, well that's not gonna be possible....The software I pass the addresses to prints mailing labels :)
Andrew Rollings