ansaurus

Question

How to convert a Unicode character to its ASCII equivalent

Answer 1

A:

Hmm … I'm not sure which character you mean. The caret (“^”, CIRCUMFLEX ACCENT) has the same code in ASCII and Unicode (U+005E).

/EDIT: Damn, my fault. 710 (U+02C6) is actually the MODIFIER LETTER CIRCUMFLEX ACCENT. Unfortunately, this character isn't part of ASCII at all. It might look like the normal caret but it's a different character. Simple conversion won't help here. I'm not sure if .NET supports mapping of similar characters when converting from Unicode. Worth investigating, though.

Konrad Rudolph 2008-09-26 09:29:29

I just edited the post to reflect what the OP meant. :-)

Chris Jester-Young 2008-09-26 09:31:22

Unicode != UTF-8

OJ 2008-09-26 09:32:45

OJ: What has this got to do with UTF-8?

Chris Jester-Young 2008-09-26 09:34:45

@OJ, I'm aware of that. However, the code point of a character is the same in all Unicode encodings.

Konrad Rudolph 2008-09-26 09:35:16

@Chris: In Konrad's original post he talked about UTF8, not Unicode.

OJ 2008-09-26 09:37:24

@OJ: Ah, gotcha.

Chris Jester-Young 2008-09-26 09:38:50

You're right, it is indeed MODIFIER LETTER CIRCUMFLEX ACCENT, see my edits.

Huppie 2008-09-26 09:59:52

Answer 2

A:

Try doing a search and replace:

myInput.Replace('ˆ', '^');

Chris Jester-Young 2008-09-26 09:33:58

So what about other Unicode characters in that case? ;-)

Huppie 2008-09-26 09:52:26

Answer 3

A:

The value 63 is the question mark, AKA "I am not able to display this character in ASCII".

Timbo 2008-09-26 09:36:04

So, you're pinpointing my problem. The questions is how DO I do this, I know the method I tried does not work.

Huppie 2008-09-26 09:59:11

Answer 4

+2 A:

You cannot use the default ASCII encoding (Encoding.ASCII) here, but must create the encoding with the appropriate code page using Encoding.GetEncoding(...). You might try to use code page 1252, which is a superset of ISO 8859-1.

csgero 2008-09-26 10:05:18

Like so: byte[] bytes = Encoding.GetEncoding(437).GetBytes("ê");

Huppie 2008-09-26 10:34:34

Answer 5

+1 A:

ASCII does not define ê; the number 136 comes from the number for the circumflex in 8-bit encodings such as Windows-1252.

Can you verify that a small e with a circumflex (ê) is actually what is supposed to be stored in the Access database in this case? Perhaps U+02C6 U+0065 is the result of a conversion error, where the input is actually an e followed by a circumflex, or something else entirely. Perhaps your Access database has corrupt data in the sense that the designated encoding does not match the contents, in which case the .NET client might incorrectly parse the data (using the wrong decoder).

If this error is indeed introduced during the reading from the database, perhaps pasting some code or configuration settings might help.

In Code page 437, character number 136 is an e with a circumflex.

bzlm 2008-09-26 10:06:11

Thanks! Your tip helped a lot, it was in fact codepage 437 (MS-DOS). Using Encoding.GetEncoding(437) it worked.

Huppie 2008-09-26 10:22:04

Answer 6

+6 A:

Okay, let's elaborate. Both csgero and bzlm pointed in the right direction.

Because of blzm's reply I looked up the Windows-1252 page on wiki and found that it's called a codepage. The wikipedia article for Code page which stated the following:

No formal standard existed for these ‘extended character sets’; IBM merely referred to the variants as code pages, as it had always done for variants of EBCDIC encodings.

This led me to codepage 437:

n ASCII-compatible code pages, the lower 128 characters maintained their standard US-ASCII values, and different pages (or sets of characters) could be made available in the upper 128 characters. DOS computers built for the North American market, for example, used code page 437, which included accented characters needed for French, German, and a few other European languages, as well as some graphical line-drawing characters.

So, codepage 437 was the codepage I was calling 'extended ASCII', it had the ê as character 136 so I looked up some other chars as well and they seem right.

csgero came with the Encoding.GetEncoding() hint, I used it to create the following statement which solves my problem:

byte[] bytes = Encoding.GetEncoding(437).GetBytes("ê");

Huppie 2008-09-26 20:37:17

Answer 7

A:

Using WordPress functions, read:

[solution][howto] Convert special characters to normal chars (é to e) url: [solution][howto] Convert special characters to normal chars (é to e)

which converts txt like: testáén to testaen :)

Ramon Fincken 2009-11-17 12:03:41

This... must be the most worthless reply in a topic ever! I'm speechless. This already borders on heroic ingenioustness! You managed to violate at least 4 unrelated reasons why NOT to post this: The question was finished more than a year ago; the OP is working in .NET; the question is in no way whatsoever connected to Wordpress; the OP doesn't want to get rid of "bad" characters but to retain them correctly. WTF?!

Vilx- 2009-11-17 12:10:36

ansaurus

tags:

views:

answers:

How to convert a Unicode character to its ASCII equivalent

related questions