views:

575

answers:

4

Hi all,

I've got a little problem changing the ecoding of a string. Actually I read from a DB strings that are encoded using the codepage 850 and I have to prepare them in order to be suitable for an interoperable WCF service.

From the DB I read characters \x10 and \x11 (triangular shapes) and i want to convert them to the Unicode format in order to prevent serialization/deserialization problem during WCF call. (Chars and are not valid according of the XML specs even if WCF serialize them).

Now, I use following code in order to covert string encoding, but nothing happens. Result string is in fact identical to the original one.

I'm probably missing something...

Please help me!!!

Emanuele

 static class UnicodeEncodingExtension
    {
        public static string Convert(this Encoding sourceEncoding, Encoding targetEncoding, string value)
        {
            string reEncodedString = null;

            byte[] sourceBytes = sourceEncoding.GetBytes(value);
            byte[] targetBytes = Encoding.Convert(sourceEncoding, targetEncoding, sourceBytes);
            reEncodedString = sourceEncoding.GetString(targetBytes);

            return reEncodedString;
        }

    }

    class Program
    {
        private static Encoding Cp850Encoding = Encoding.GetEncoding(850);
        private static Encoding UnicodeEncoding = Encoding.UTF8;

        static void Main(string[] args)
        {
            string value;
            string resultValue;
            value = "\x10";
            resultValue = Cp850Encoding.Convert(UnicodeEncoding, value);

            value = "\x11";
            resultValue = Cp850Encoding.Convert(UnicodeEncoding, value);

            value = "\u25b6";
            resultValue = UnicodeEncoding.Convert(Cp850Encoding, value);

            value = "\u25c0";
            resultValue = UnicodeEncoding.Convert(Cp850Encoding, value);

        }

    }
A: 

All the strings stored in string are in fact Unicode.Unicode. Read: Strings in .Net and C# and The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Edit: I suppose that you want the Convert function to automatically change \x11 to \u25c0, but the problem here is that \x11 is valid in almost any encoding, the differences usually start in character \x80, so the Convert function will maintain it even if you do that:

string reEncodedString = null;
byte[] unicodeBytes = UnicodeEncoding.Unicode.GetBytes(value);
byte[] sourceBytes = Encoding.Convert(Encoding.Unicode,
                                sourceEncoding, unicodeBytes);

You can see in unicode.org the mappings from CP850 to Unicode. So, for this conversion to happen you will have to change these characters manually.

jmservera
A: 

It seems you think there is a problem based on an incorrect understanding. But jmservera is correct - all strings in .NET are encoded internally as unicode.

You didn't say exactly what you want to accomplish. Are you experiencing a problem at the other end of the wire?

Just FYI, you can set the text encoding on a WCF binding with the textMessageEncoding element in the config file.

Cheeso
The problem is during during Java client-side decoding. When a string contains \x10 or \x11 WCF wrongly permit encoding them in and that are not valid XML characters according to XML specification. I saw here [http://en.wikipedia.org/wiki/Code_page_850] that char \x10 in codePage 850 correspond to char \u25ba and so I thought that encoding conversion should solve my problem.
according to the XML spec, processors are required to handle UTF-8 and UTF-16. So, can you not encode as UTF-8 and ship your characters across the wire to the client side?
Cheeso
No. When I serialize \x10 the result is that is a valid Unicode char but not a valid XML char. In other world XML specs handle UTF-* chars except certain char range. See here [http://www.w3.org/TR/2008/REC-xml-20081126/#charsets]
A: 

I suspect this line may be your culprit

reEncodedString = sourceEncoding.GetString(targetBytes);

which seems to take your target encoded string of bytes and asks your sourceEncoding to make a string out of them. I've not had a chance to verify it but I suspect the following might be better

reEncodedString = targetEncoding.GetString(targetBytes);
Lazarus
A: 
  1. byte[] sourceBytes =Encoding.Default.GetBytes(value)
  2. Encoding.UTF8.GetString(sourceBytes)

this sequence usefull for download unicode file from service(for example xml file that contain persian character)

yoohoo