views:

564

answers:

5

I have a hexidecimal string that I need to convert to a byte array. The best way (ie efficient and least code) is:

string hexstr = "683A2134";
byte[] bytes = new byte[hexstr.Length/2];
for(int x = 0; x < bytes.Length; x++)
{
    bytes[x] = Convert.ToByte(hexstr.Substring(x * 2, 2), 16);
}

In the case where I have a 32bit value I can do the following:

string hexstr = "683A2134";
byte[] bytes = BitConverter.GetBytes(Convert.ToInt32(hexstr, 16));

However what about in the general case? Is there a better built in function, or a clearer (doesn't have to be faster, but still performant) way of doing this?

I would prefer a built in function as there seems to be one for everything (well common things) except this particular conversion.

+4  A: 

There's nothing built-in, unfortunately. (I really should have the code I've got here somewhere else - it's at least the 3rd or 4th time I've written it.)

You could certainly create a more efficient version which parsed a nybble from a char rather than taking a substring each time, but it's more code. If you're using this a lot, benchmark the original code to see whether or not it's adequate first.

private static int ParseNybble(char nybble)
{
    // Alternative implementations: use a lookup array
    // after doing some bounds checking, or use 
    // if (nybble >= '0' && nybble <= '9') return nybble-'0' etc
    switch (nybble)
    {
        case '0' : return 0;
        case '1' : return 1;
        case '2' : return 2;
        case '3' : return 3;
        case '4' : return 4;
        case '5' : return 5;
        case '6' : return 6;
        case '7' : return 7;
        case '8' : return 8;
        case '9' : return 9;
        case 'a': case 'A' : return 10;
        case 'b': case 'B' : return 11;
        case 'c': case 'C' : return 12;
        case 'd': case 'D' : return 13;
        case 'e': case 'E' : return 14;
        case 'f': case 'F' : return 15;
        default: throw new ArgumentOutOfRangeException();
    }
}

public static byte[] ParseHex(string hex)
{
    // Do error checking here - hex is null or odd length
    byte[] ret = new byte[hex.Length/2];
    for (int i=0; i < ret.Length; i++)
    {
        ret[i] = (byte) ((ParseNybble(hex[i*2]) << 4) |
                         (ParseNybble(hex[i*2+1])));
    }
    return ret;
}
Jon Skeet
Thanks for spotting the error (I was playing with 2 different versions)
Robert Wagner
A: 
public class HexCodec {
  private static final char[] kDigits =
      { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
        'a', 'b', 'c', 'd', 'e', 'f' };

  public static byte[] HexToBytes(char[] hex) {
    int length = hex.length / 2;
    byte[] raw = new byte[length];
    for (int i = 0; i < length; i++) {
      int high = Character.digit(hex[i * 2], 16);
      int low = Character.digit(hex[i * 2 + 1], 16);
      int value = (high << 4) | low;
      if (value > 127)
        value -= 256;
      raw[i] = (byte) value;
    }
    return raw;
  }

  public static byte[] HexToBytes(String hex) {
    return hexToBytes(hex.toCharArray());
  }
}
Koistya Navin
Isn't this the same as my example?
Robert Wagner
No - it swallows exceptions and returns bad data ;)
Jon Skeet
That's true. I'm working with validated data though, I know it's a valid hex string.
Robert Wagner
It's not C#, so what is it? J#?
Guffa
@Guffa Looks like C# to me? Which bit looks wrong?
Robert Wagner
@Robert: The "final" keyword, the case insensetive identifiers, the "Character" class...
Guffa
+5  A: 

You get the best performance if you calculate the values from the character codes instead of creating substrings and parsing them.

Code in C#, that handles both upper and lower case hex (but no validation):

static byte[] ParseHexString(string hex) {
 byte[] bytes = new byte[hex.Length / 2];
 int shift = 4;
 int offset = 0;
 foreach (char c in hex) {
  int b = (c - '0') % 32;
  if (b > 9) b -= 7;
  bytes[offset] |= (byte)(b << shift);
  shift ^= 4;
  if (shift != 0) offset++;
 }
 return bytes;
}

Usage:

byte[] bytes = ParseHexString("1fAB44AbcDEf00");

As the code uses a few tricks, here a commented version:

static byte[] ParseHexString(string hex) {
 // array to put the result in
 byte[] bytes = new byte[hex.Length / 2];
 // variable to determine shift of high/low nibble
 int shift = 4;
 // offset of the current byte in the array
 int offset = 0;
 // loop the characters in the string
 foreach (char c in hex) {
  // get character code in range 0-9, 17-22
  // the % 32 handles lower case characters
  int b = (c - '0') % 32;
  // correction for a-f
  if (b > 9) b -= 7;
  // store nibble (4 bits) in byte array
  bytes[offset] |= (byte)(b << shift);
  // toggle the shift variable between 0 and 4
  shift ^= 4;
  // move to next byte
  if (shift != 0) offset++;
 }
 return bytes;
}
Guffa
Hard to decide between this one and Johns. Johns is a lot easier to understand, but I think this one is more 'elegant'
Robert Wagner
A: 

Here's a one-liner using LINQ. It's basically just a translation of your original version:

string hexstr = "683A2134";

byte[] bytes = Enumerable.Range(0, hexstr.Length / 2)
    .Select((x, i) => Convert.ToByte(hexstr.Substring(i * 2, 2), 16))
    .ToArray();

If you'll potentially need to convert strings of uneven length (ie, if they might have an implicit leading-zero) then the code becomes a bit more complicated:

string hexstr = "683A2134F";    // should be treated as "0683A2134F"

byte[] bytes = Enumerable.Range(0, (hexstr.Length / 2) + (hexstr.Length & 1))
    .Select((x, i) => Convert.ToByte(hexstr.Substring((i * 2) - (i == 0 ? 0 : hexstr.Length & 1), 2 - (i == 0 ? hexstr.Length & 1 : 0)), 16))
    .ToArray();
LukeH
+1  A: 

Take a look at this - it's very short and is part of the .NET framework:

System.Runtime.Remoting.Metadata.W3cXsd2001.SoapHexBinary.Parse("C3B01051359947").Value

Gart