views:

57

answers:

1

Well, sorry about the confusing title but I'm having a slightly annoying problem with character encoding in C#.NET

I have a bunch of classes generated from WSDL files, these classes have methods which take string parameters which are then submitted to a remote web service. This remote web service expects all text input to be UTF-8 encoded. Now, as far as I can tell there really isn't a way to make a string in C#.NET UTF-8 encoded, it's UTF-16 or nothing, if I want UTF-8 I have to make it a byte[], right?

So, my big question is, how am I supposed to put my raw UTF-8 byte[] data into a string so I can actually submit it to the web service? I mean, sure, I could probably fall back on C-style code, looping through the whole thing byte by byte but surely Microsoft must have thought about this when designing the language and API? (although since my Vista laptop thinks it's perfectly alright to use UTF-16 internally, cp1252 for some stuff, UTF-8 for some other and cp850(!) for some other stuff I wouldn't be too surprised if they didn't).

So, am I stuck doing things the ugly way or is there some hidden System.Text.EncodeStuffTherightWay.EncodeStringAsUTF8(string) method deep in the bowels of .NET?

+4  A: 

Strings never contain anything utf-* or anything else encoded; that isn't their job. They are strings - groups of character/code-point data. The byte[] that you have is the encoded form.

In almost any scenario I can think of, the transport etc should be doing this for you already. If isn't then that sounds like a bug in either the wsdl or the web-service stack itself.

Keep in mind that wsdl itself just has xs:string - if that isn't sufficient (i.e. that in combination with the handshake isn't enough), then it simply isn't a web-service string.

The alternative is to throw it around as a byte[], and encode manually via

byte[] bytes=Encoding.UTF8.GetBytes(yourString);
Marc Gravell