




I would like to put a string into a byte array, but the string may be too big to fit. In the case where it's too large, I would like to put as much of the string as possible into the array. Is there an efficient way to find out how many characters will fit?

+2  A: 

In order to truncate a string to a UTF8 byte array without splitting in the middle of a character I use this:

static string Truncate(string s, int maxLength) {
    if (Encoding.UTF8.GetByteCount(s) <= maxLength)
     return s;
    var cs = s.ToCharArray();
    int length = 0;
    int i = 0;
    while (i < cs.Length){
     int charSize = 1;
     if (i < (cs.Length - 1) && char.IsSurrogate(cs[i]))
      charSize = 2;
     int byteSize = Encoding.UTF8.GetByteCount(cs, i, charSize);
     if ((byteSize + length) <= maxLength){
      i = i + charSize;
      length += byteSize;
    return s.Substring(0, i);

The returned string can then be safely transferred to a byte array of length maxLength.

+1  A: 

You should be using the Encoding class to do your conversion to byte array correct? All Encoding objects have an overridden method GetMaxCharCount, which will give you "The maximum number of characters produced by decoding the specified number of bytes." You should be able to use this value to trim your string and properly encode it.

+1  A: 

Efficient way would be finding how much (pessimistically) bytes you will need per character with


then dividing your string size by the result, then converting that much characters with

public virtual int Encoding.GetBytes (
 string s,
 int charIndex,
 int charCount,
 byte[] bytes,
 int byteIndex

If you want to use less memory use


but that is a much slower method.


The Encoding class in .NET has a method called GetByteCount which can take in a string or char[]. If you pass in 1 character, it will tell you how many bytes are needed for that 1 character in whichever encoding you are using.

The method GetMaxByteCount is faster, but it does a worst case calculation which could return a higher number than is actually needed.

Joseph Daigle

Cookey, your code doesn't do what you apparent think it does. Pre-allocating the byte buffer in your case is pure waste because it will not be used. Rather, your assignment drops the allocated memory and resets the arr reference to point to another buffer because Encoding.GetBytes returns a new array.

Konrad Rudolph