views:

1353

answers:

4

While there are 100 ways to solve the conversion problem, I am focusing on performance.

Give that the string only contains binary data, what is the fastest method, in terms of performance, of converting that data to a byte[] (not char[]) under C#?

Clarification: This is not ASCII data, rather binary data that happens to be in a string.

+4  A: 

UTF8Encoding.GetBytes

sixlettervariables
A: 

There is no such thing as an ASCII string in C#! Strings always contain UTF-16. Not realizing this leads to a lot of problems. That said, the methods mentioned before work because they consider the string as UTF-16 encoded and transform the characters to ASCII symbols.

/EDIT in response to the clarification: how did the binary data get in the string? Strings aren't supposed to contain binary data (use byte[] for that).

Konrad Rudolph
I think the user has a strange file format with mixed text and binary data.
Davy Landman
+3  A: 

I'm not sure ASCIIEncoding.GetBytes is going to do it, because it only supports the range 0x0000 to 0x007F.

You tell the string contains only bytes. But a .NET string is an array of chars, and 1 char is 2 bytes (because a .NET stores strings as UTF16). So you can either have two situations for storing the bytes 0x42 and 0x98:

  1. The string was an ANSI string and contained bytes and is converted to an unicode string, thus the bytes will be 0x00 0x42 0x00 0x98. (The string is stored as 0x0042 and 0x0098)
  2. The string was just a byte array which you typecasted or just recieved to an string and thus became the following bytes 0x42 0x98. (The string is stored as 0x9842)

In the first situation on the result would be 0x42 and 0x3F (ascii for "B?"). The second situation would result in 0x3F (ascii for "?"). This is logical, because the chars are outside of the valid ascii range and the encoder does not know what to do with those values.

So i'm wondering why it's a string with bytes?

  • Maybe it contains a byte encoded as a string (for instance Base64)?
  • Maybe you should start with an char array or a byte array?

If you realy do have situation 2 and you want to get the bytes out of it you should use the UnicodeEncoding.GetBytes call. Because that will return 0x42 and 0x98.

If you'd like to go from a char array to byte array, the fastest way would be Marshaling.. But that's not really nice, and uses double memory.

public Byte[] ConvertToBytes(Char[] source)
{
    Byte[] result = new Byte[source.Length * sizeof(Char)];
    IntPtr tempBuffer = Marshal.AllocHGlobal(result.Length);
    try
    {
        Marshal.Copy(source, 0, tempBuffer, source.Length);
        Marshal.Copy(tempBuffer, result, 0, result.Length);
    }
    finally
    {
        Marshal.FreeHGlobal(tempBuffer);
    }
    return result;
}
Davy Landman
@Davy Landman: I think we could both use more details on his requirements
sixlettervariables
@sixlettervariables: Indeed, I was just trying to explain to Noah that his clarification did not make it clear enough.
Davy Landman
A: 

If you want to go from a string to binary data, you must know what encoding was used to convert the binary data to a string in the first place. Otherwise, you might not end up with the correct binary data. So, the most efficient way is likely GetBytes() on an Encoding subclass (such as UTF8Encoding), but you must know for sure which encoding.

The comment by Kent Boogaart on the original question sums it up pretty well. ;]

bzlm