ansaurus

Question

Answer 1

A:

All the strings stored in string are in fact Unicode.Unicode. Read: Strings in .Net and C# and The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Edit: I suppose that you want the Convert function to automatically change \x11 to \u25c0, but the problem here is that \x11 is valid in almost any encoding, the differences usually start in character \x80, so the Convert function will maintain it even if you do that:

string reEncodedString = null;
byte[] unicodeBytes = UnicodeEncoding.Unicode.GetBytes(value);
byte[] sourceBytes = Encoding.Convert(Encoding.Unicode,
                                sourceEncoding, unicodeBytes);

You can see in unicode.org the mappings from CP850 to Unicode. So, for this conversion to happen you will have to change these characters manually.

jmservera 2009-05-19 12:38:44

Answer 2

A:

It seems you think there is a problem based on an incorrect understanding. But jmservera is correct - all strings in .NET are encoded internally as unicode.

You didn't say exactly what you want to accomplish. Are you experiencing a problem at the other end of the wire?

Just FYI, you can set the text encoding on a WCF binding with the textMessageEncoding element in the config file.

Cheeso 2009-05-19 13:04:52

The problem is during during Java client-side decoding. When a string contains \x10 or \x11 WCF wrongly permit encoding them in and that are not valid XML characters according to XML specification. I saw here [http://en.wikipedia.org/wiki/Code_page_850] that char \x10 in codePage 850 correspond to char \u25ba and so I thought that encoding conversion should solve my problem.

2009-05-19 14:26:17

according to the XML spec, processors are required to handle UTF-8 and UTF-16. So, can you not encode as UTF-8 and ship your characters across the wire to the client side?

Cheeso 2009-05-19 14:40:51

No. When I serialize \x10 the result is that is a valid Unicode char but not a valid XML char. In other world XML specs handle UTF-* chars except certain char range. See here [http://www.w3.org/TR/2008/REC-xml-20081126/#charsets]

2009-05-20 17:25:41

Answer 3

A:

I suspect this line may be your culprit

reEncodedString = sourceEncoding.GetString(targetBytes);

which seems to take your target encoded string of bytes and asks your sourceEncoding to make a string out of them. I've not had a chance to verify it but I suspect the following might be better

reEncodedString = targetEncoding.GetString(targetBytes);

Lazarus 2009-05-19 14:02:13

Answer 4

A:

byte[] sourceBytes =Encoding.Default.GetBytes(value)
Encoding.UTF8.GetString(sourceBytes)

this sequence usefull for download unicode file from service(for example xml file that contain persian character)

yoohoo 2010-01-10 14:13:37

ansaurus

tags:

views:

answers:

Encoding Conversion problem

related questions