views:

284

answers:

4

How do you convert a byte array to a string? I need to get the raw content, e.g. "96=A8=FC-=A8=FE", but when I use say Encoding.UTF8.GetString(bytes), it returns "96��-��". Thanks!

+1  A: 

You probably want to use the ASCII encoding, rather than the UTF8.

John Gietzen
+1  A: 

You effectively want a formatted string of the hexadecimal representation of each bytes. The question How do you convert Byte Array to Hexadecimal String, and vice versa, in C#? will show you how to get the string, and you can alter that code to add any "in-between-bytes" formatting you want.

Adam Wright
Thanks Adam! This was also helpful.
ban-G
+7  A: 

I think you misunderstand the content of strings. The closest you'll get to "raw content" is to use Encoding.Unicode - .NET uses UTF-16 internally, so converting to UTF-16 is effectively just a case of copying the contents of memory from the string to the byte array.

Now, to come back to your problem, what data do you have, what is it meant to represent and why? Textual data is characters. Binary data is numbers, basically. You have to have a mapping between the two, and that's the encoding.

I have an article on Unicode which may help you, but I strongly suspect you'll need to take a step back before you make any progress.

If you're trying to convert a byte array into a string representation of those bytes as hex, you can just use BitConverter.ToString(byte[]) but I wouldn't describe that as a "raw" conversion.

EDIT: Okay, now that we have the context, it's much easier to answer. What you're looking at is quoted printable encoding. The email should specify the encoding of the quoted printable, so when you decode the QP encoding, that's what you should use. If you're not currently storing the content encoding of the original email, you should start doing so right now...

Jon Skeet
I think you're right to ask him to take a step back and reconsider the question. My best guess, though, is that he expects these encoded hex values to translate to characters that are viewable on his machine. If so, perhaps he needs to map from some local code page to Unicode.
Steven Sudit
That's just Encoding.Default, if that's really what he expects :)
Jon Skeet
What I need to do is parse the contents of an email. Before parsing we save the contents in the database. That column is returned to me in byte[]. My main problem is when the email contains fractions, e.g. "96⅜-⅞". The fraction part is converted in to hexadecimal but the rest of the message is the same. I already have a code that will take care of the conversion (using regular expression) but I don't know how to get "96=A8=FC-=A8=FE" from the returned column, which is in byte[].Maybe my approach to solving this problem is unsatisfactory. Any suggestions would be appreciated. Thanks!
ban-G
Editing answer...
Jon Skeet
Just to complicate Jon's answer, not only do you need to check for the content-encoding being QP, you need to check for what charset to interpret it as. ASCII 0-127 is more or less consistent, but whether 0xA8 maps to a fraction character depends entirely on the charset. Frankly, this is not a trivial problem to solve; you need a deeper understanding of RFC 2822 and everything it touches. May I suggest giving up and using a decent library, like Chilkat's S/MIME decoder?
Steven Sudit
Thanks again Jon Skeet for your input and for sharing the link to your article about unicode and .net -- it was really helpful!
ban-G
Thanks also Steven for your inputs. I think my previous approach to solving this issue made things more complicated. Instead of converting the bytes[] to string and replacing the hex (fractions) to readable characters .. since I know how the msg was originally encoded, I converted the bytes to UTF8 bytes and pass that to the parser. So far, its working and the vulgar fractions are handled correctly. Anyways .. encoding is really confusing. I need to understand how it works more. =)
ban-G
A: 

You can also use the Convert.ToBase64CharArray Method

http://msdn.microsoft.com/en-us/library/system.convert.tobase64chararray(VS.80).aspx

Convert.ToBase64CharArray (Byte[], Int32, Int32, Char[], Int32)

Converts a subset of an 8-bit unsigned integer array to an equivalent subset of a Unicode character array encoded with base 64 digits. Parameters specify the subsets as offsets in the input and output arrays, and the number of elements in the input array to convert.

Anjisan