views:

717

answers:

8

Hi,

What's the fastest way to turn a string into a byte[] array in C#? I'm sending tonnes of string data through sockets and need to optimize every single operation. Currently I transform the strings in to byte[] arrays before sending using:

private static readonly Encoding encoding = new ASCIIEncoding();
//...
byte[] bytes = encoding.GetBytes(someString);
socket.Send(bytes);
//...
+6  A: 

I would say that how you are doing it now is plenty good. If you are really concerned with very low level optimization like that, the best recommendation I can make is get Reflector. With reflector, you can look at the code yourself (most of the time), and see what the algorithms are. If reflector does not show you, you could always download Microsofts SSCLI (Shared Source Common Language Infrastructure) to see the C++ code behind MethodImplOptions.InternalCall methods.

For reference, here is the actual implementation of Encoding.ASCII.GetBytes:

public override int GetBytes(string chars, int charIndex, int charCount, byte[] bytes, int byteIndex)
{
    if ((chars == null) || (bytes == null))
    {
        throw new ArgumentNullException();
    }
    if ((charIndex < 0) || (charCount < 0))
    {
        throw new ArgumentOutOfRangeException();
    }
    if ((chars.Length - charIndex) < charCount)
    {
        throw new ArgumentOutOfRangeException();
    }
    if ((byteIndex < 0) || (byteIndex > bytes.Length))
    {
        throw new ArgumentOutOfRangeException();
    }
    if ((bytes.Length - byteIndex) < charCount)
    {
        throw new ArgumentException();
    }
    int num = charIndex + charCount;
    while (charIndex < num)
    {
        char ch = chars[charIndex++];
        if (ch >= '\x0080')
        {
            ch = '?';
        }
        bytes[byteIndex++] = (byte) ch;
    }
    return charCount;
}
jrista
+2  A: 

I imagine the GetBytes() function is already well optimized for this. I can't think of any suggestions to improve the speed of your existing code.

EDIT -- You know, I don't know if this is faster or not. But here's another method using the BinaryFormatter:

BinaryFormatter bf = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
bf.Serialize(ms, someString);
byte[] bytes =  ms.ToArray();
ms.Close();
socket.Send(bytes);

The reason I think this might be faster is that it skips the encoding step. I'm also not entirely sure this will work properly. But you might try it and see. Of course, if you need the ascii encoding then this won't help.

I just had another thought. I believe this code would return double the number of bytes than using GetBytes with ASCII encoding. The reason is that all strings in .NET use unicode behind the scenes. And of course Unicode uses 2 bytes per character, whereas ASCII uses just 1. So the BinaryFormatter is probably not the thing to use in this case because you'd be doubling the amount of data you're sending over the socket.

Steve Wortham
Just a note about using a binary formatter and memory stream. You would have to construct those two objects each time you needed to convert bytes, where as just using the ASCIIEncoder, you call a method and thats all. Object construction cost is fairly high at this low level, and could be a major factor.
jrista
Excellent point. This may be something you'd only want to consider with large strings where the length of the string offsets the construction cost. Of course, this is all theoretical (at least to me). I don't even know if this method would ever be faster.
Steve Wortham
+1  A: 

What are you trying to optimize for? CPU? Bandwidth?

If you're to optimize bandwidth, you could try compressing the string data beforehand.

First, profile your code, figure out what the slow bits are, before you try to optimize at such a low level.

Nader Shirazie
+1: Yes, yes, yes
Walt W
I'm optimizing for CPU
Nosrama
You should also consider *memory bus* bandwidth. When performing computationally simple operations on large amounts of data, it's often the case that the CPU spends most of its time waiting on the much slower clock of the FSB.
Crashworks
A: 

As others have said, the Encoding class is already optimized for that task, so it will probably be hard to make it faster. There's one micro-optimization that you could do : use Encoding.ASCII rather than new ASCIIEncoding(). But as everyone knows, micro-optimizations are bad ;)

Thomas Levesque
+11  A: 

If all your data is really going to be ASCII, then you may be able to do it slightly faster than ASCIIEncoding, which has various (entirely reasonable) bits of error handling etc. You may also be able to speed it up by avoiding creating new byte arrays all the time. Assuming you have an upper bound which all your messages will be under:

void QuickAndDirtyAsciiEncode(string chars, byte[] buffer)
{
    int length = chars.Length;
    for (int i = 0; i < length; i++)
    {
        buffer[i] = (byte) (chars[i] & 0x7f);
    }
}

You'd then do something like:

readonly byte[] Buffer = new byte[8192]; // Reuse this repeatedly
...
QuickAndDirtyAsciiEncode(text, Buffer);
// We know ASCII takes one byte per character
socket.Send(Buffer, text.Length, SocketFlags.None);

This is pretty desperate optimisation though. I'd stick with ASCIIEncoding until I'd proven that this was the bottleneck (or at least that this sort of grotty hack doesn't help).

Jon Skeet
+1 for *desperate*
Nader Shirazie
James Schek
@James Schek: Only if it fails! ;-) Also, it's inappropriate here, as this is an actual type conversion, not a type check, *and* the `as` keyword can only be used for types that can be `null` (i.e. reference types and `Nullable<T>` / `T?`).
P Daddy
@P-Daddy--thanks for the clarification!
James Schek
+1  A: 

With no clue to your concurrency requirements (or anything else): Can you spawn some threads on the ThreadPool that convert the strings to byte arrays and drop them into a Queue, and have one more thread watching the Queue and sending the data?

ebpower
A: 

I'd suggest profiling what you're doing. I find it doubtful that the speed of converting a string to a byte array is a larger problem in throughput than the speed of the socket itself.

kyoryu
In the comments he explains he has profiled it and traced the bottleneck here.
Crashworks
A: 

Just another tip : I don't know how you create your initial Strings, but remember that StringBuilder.Append("something") is really faster than something like myString += "something".

In the whole process of creating the strings, and sending them through a socket connection, I would be surprized if the bottleneck was the conversion of Strings into byte arrays. But I'm very interested if someone would test this with a profiler.

Ben

Ben