views:

166

answers:

3

Is anyone aware of any simple way to anglicize a string? Currently, in our system, we're doing replacements on "invalid" characters, such as shown below:

        ret = ret.Replace("ä", "ae");
        ret = ret.Replace("Ä", "Ae");
        ret = ret.Replace("ß", "ss");
        ret = ret.Replace("ç", "c");
        ret = ret.Replace("Ç", "C");
        ret = ret.Replace("Ž", "Z");

The issue here is that as we're opening the business up in additional countries (Turkey, Russia, Hungary...), we're finding that there's a whole slew of characters that this process does not convert.

Is anyone aware of any sort of solution that would allow us to not depend on a table of "invalid" characters?

Also, if it helps, we're using C# to code. :)

Thanks!


edit:

In response to some comments, our system does support the full set of unicode characters... however, other system that we integrate to (such as card processors) do not. :(

+3  A: 

Check out this question and its answers and take a look at this blog entry on converting diacritical characters to their ASCII equivalents.

luvieere
I've actually just tried that method, and it doesn't seem to catch every character. æœÄŒæßüÿt° is converted æœAŒæßuyt°ö, which I would expect an anglicization to oe, changes to simply o
Tyllyn
@Tyllyn: In fact the translation can also be language dependent. In Swedish "ö" is mapped to "o", whereas in German you would represent it as "oe".
0xA3
@divo: Good lord, that makes everything even more confusing. :<
Tyllyn
A: 

Just because a letter looks similar to a traditional English letter does not make it equivalent. What is the business case for not just supporting Unicode and any characters your audience chooses to use?

richardtallent
Our mail server (which we are changing soon) doesn't support characters outside of the [a-zA-Z0-9] set for usernames.And to card processors we're using doesn't support it at some point.From our business practice, we have not limited to this limited character set... and, well, it has caused problems when going to other systems. :(
Tyllyn
+1  A: 

As an answer to the modified problem (mail server supports only alphanumeric characters in usernames):

Let the users choose their own usernames, allowing only alphanumeric characters. They probably know best how to "anglicize" it.

Amnon
We are going to the route, to ensure that usernames are properly put into the [a-zA-Z0-9]. but , at least with the email part, does not allow us to handle pre-existing usernames.Also, with one of the card processors, we send them a file that needs "anglicized" prior. Fields that need converted as such include address and name. We could allow the user to enter a properly anglicized solution, but this would most definitely cause a slow down in continuing these operations at the rate that we have, affecting the business as a whole. We want to have as little user involvement as possible.
Tyllyn