ansaurus

Question

How to turn a hex string into an unsigned char array?

Answer 1

+3 A:

This answers the original question, which asked for a C++ solution.

You can use an istringstream with the hex manipulator:

std::string hex_chars("E8 48 D8 FF FF 8B 0D");

std::istringstream hex_chars_stream(hex_chars);
std::vector<unsigned char> bytes;

unsigned int c;
while (hex_chars_stream >> std::hex >> c)
{
    bytes.push_back(c);
}

Note that c must be an int (or long, or some other integer type), not a char; if it is a char (or unsigned char), the wrong >> overload will be called and individual characters will be extracted from the string, not hexadecimal integer strings.

Additional error checking to ensure that the extracted value fits within a char would be a good idea.

James McNellis 2010-07-10 23:09:55

+1 and deleting my equivalent (but not as good) answer.

Billy ONeal 2010-07-10 23:11:39

Because I cannot give two correct answers, I went ahead and upvoted this one, as this definitely is a great solution for C++ users!

Gbps 2010-07-11 00:50:23

Answer 2

+5 A:

Iterate through all the characters.
- If you have a hex digit, the number is (ch > 'A')? (ch - 'A' + 10): (ch - '0').
  - Left shift your accumulator by four bits and add (or OR) in the new digit.
- If you have a space, and the previous character was not a space, then append your current accumulator value to the array and reset the accumulator back to zero.

Ben Voigt 2010-07-10 23:20:02

+1: This is probably the most straightforward and simple way to do it.

James McNellis 2010-07-10 23:22:40

That's basically what I did, except for using switch instead of ternary test. Depending on compiler and processor architecture one or the other may be faster. But you should also test every character is in range 0-9A-F, and it makes testing the same thing two times.

kriss 2010-07-10 23:42:22

@kriss: It's all in the assumptions. You assume that there must be exactly two hex digits and one space between each value, mine allows omission of a leading zero or multiple spaces, but assumes that there are no other classes of characters in the string. If you can't assume that, I'd probably choose to do validation separately, by testing `if (s[strspn(s, " 0123456789ABCDEF")]) /* error */;` Sure, it's another pass on the string, but so much cleaner. Or avoid the second pass over the string by using `isspace` and `isxdigit` on each character, which uses a lookup table for speed.

Ben Voigt 2010-07-11 00:19:28

@Ben: Looping around switches is not really an issue, I do not really take it as a difference. I choosed to assume there was exactly two hex char in input, because if you allow more than that you should also check range for values. And what about allowing negativer numbers, we would have to manage sign, etc. switch *is* a kind of lookup table... (and another fast conversion method would be to really use one implemented as an array).

kriss 2010-07-11 00:40:37

The problem specified that all inputs were unsigned. The problem didn't specify that there would always be zeros padding to exactly two digits (e.g. all of these fit in a `char`: `0xA`, `0x0A`, `0x000A`) or just one space, although these assumptions were true on the sample input.

Ben Voigt 2010-07-11 01:23:05

Answer 3

A:

The old C way, do it by hand ;-) (there is many shorter ways, but I'm not golfing, I'm going for run-time).

enum { NBBYTES = 7 };
char res[NBBYTES+1];
const char * c = "E8 48 D8 FF FF 8B 0D";
const char * p = c;
int i = 0;

for (i = 0; i < NBBYTES; i++){
    switch (*p){
    case '0': case '1': case '2': case '3': case '4':
    case '5': case '6': case '7': case '8': case '9':
      res[i] = *p - '0';
    break;
    case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
      res[i] = *p - 'A' + 10;
    break;
   default:
     // parse error, throw exception
     ;
   }
   p++;
   switch (*p){
   case '0': case '1': case '2': case '3': case '4':
   case '5': case '6': case '7': case '8': case '9':
      res[i] = res[i]*16 + *p - '0';
   break;
   case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
      res[i] = res[i]*16 + *p - 'A' + 10;
   break;
   default:
      // parse error, throw exception
      ;
   }
   p++;
   if (*p == 0) { continue; }
   if (*p == ' ') { p++; continue; }
   // parse error, throw exception
}

// let's show the result, C style IO, just cout if you want C++
for (i = 0 ; i < 7; i++){
   printf("%2.2x ", 0xFF & res[i]);
}
printf("\n");

Now another one that allow for any number of digit between numbers, any number of spaces to separate them, including leading or trailing spaces (Ben's specs):

#include <stdio.h>
#include <stdlib.h>

int main(){
    enum { NBBYTES = 7 };
    char res[NBBYTES];
    const char * c = "E8 48 D8 FF FF 8B 0D";
    const char * p = c;
    int i = -1;

    res[i] = 0;
    char ch = ' ';
    while (ch && i < NBBYTES){
       switch (ch){
       case '0': case '1': case '2': case '3': case '4':
       case '5': case '6': case '7': case '8': case '9':
          ch -= '0' + 10 - 'A';
       case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
          ch -= 'A' - 10;
          res[i] = res[i]*16 + ch;
          break;
       case ' ':
         if (*p != ' ') {
             if (i == NBBYTES-1){
                 printf("parse error, throw exception\n");
                 exit(-1);
            }
            res[++i] = 0;
         }
         break;
       case 0:
         break;
       default:
         printf("parse error, throw exception\n");
         exit(-1);
       }
       ch = *(p++);
    }
    if (i != NBBYTES-1){
        printf("parse error, throw exception\n");
        exit(-1);
    }

   for (i = 0 ; i < 7; i++){
      printf("%2.2x ", 0xFF & res[i]);
   }
   printf("\n");
}

No, it's not really obfuscated... but well, it looks like it is.

kriss 2010-07-10 23:37:22

Are we allowed to say 'Ick!'? (If only because the code will 'throw exception' on the last loop, because there are only 6 spaces in the string, not 7 as the code requires.)

Jonathan Leffler 2010-07-10 23:43:49

@Jonathan: not any more... I could also have added a space to input. The old separators vs terminators debate.

kriss 2010-07-11 00:43:03

your little fix doesn't help... `*p != ' '` on the terminating NUL and it doesn't matter what you logical-or that with.

Ben Voigt 2010-07-11 01:05:33

Opps, I did err again. You should like the new fix better :-)

kriss 2010-07-11 01:15:55

Validity check is still flaky.

Ben Voigt 2010-07-11 01:24:58

@Ben: be patient, I didn't posted any change yet...

kriss 2010-07-11 01:29:01

Answer 4

A:

For a pure C implementation I think you can persuade sscanf(3) to do what you what. I believe this should be portable (including the slightly dodgy type coercion to appease the compiler) so long as your input string is only ever going to contain two-character hex values.

#include <stdio.h>
#include <stdlib.h>


char hex[] = "E8 48 D8 FF FF 8B 0D";
char *p;
int cnt = (strlen(hex) + 1) / 3; // Whether or not there's a trailing space
unsigned char *result = (unsigned char *)malloc(cnt), *r;
unsigned char c;

for (p = hex, r = result; *p; p += 3) {
    if (sscanf(p, "%02X", (unsigned int *)&c) != 1) {
        break; // Didn't parse as expected
    }
    *r++ = c;
}

bjg 2010-07-11 00:15:12

Declare `c` as `unsigned int`, otherwise you could overwrite other local variables (or worse yet, your return address).

Ben Voigt 2010-07-11 00:26:56

But generally scanf is going to take longer to figure out the format code than my entire answer will, and the question did ask for an *efficient* way.

Ben Voigt 2010-07-11 00:28:04

@Ben Voigt. Yes but does efficient mean run-time or programmer-time? '-) Anyway thanks for pointing out that I should have made `c` an `insigned int` and coerced that into the `result` array.

bjg 2010-07-11 01:09:50

Answer 5

+1 A:

You'll never convince me that this operation is a performance bottleneck. The efficient way is to make good use of your time by using the standard C library:

static unsigned char gethex(const char *s, char **endptr) {
  assert(s);
  while (isspace(*s)) s++;
  assert(*s);
  return strtoul(s, endptr, 16);
}

unsigned char *convert(const char *s, int *length) {
  unsigned char *answer = malloc((strlen(s) + 1) / 3);
  unsigned char *p;
  for (p = answer; *s; p++)
    *p = gethex(s, (char **)&s);
  *length = p - answer;
  return answer;
}

Compiled and tested. Works on your example.

Norman Ramsey 2010-07-11 00:23:13

I chose this as the answer because it simply provided a working example. Thanks!

Gbps 2010-07-11 00:51:59

OTOH, buffer overflow on "A B C D E F 1 2 3 4 5 6 7 8 9".

Ben Voigt 2010-07-11 01:08:38

Much simpler: `for (i=0; i<max i++) a[i]=strtol(s, ` The point being, your `gethex` function is totally redundant. `strtol` skips leading whitespace itself. If you want to be more strict about not accepting strings that don't match the pattern, you could use sscanf to control field width and measure match lengths.

R.. 2010-07-11 04:54:40

@R: great point about strtoul---I didn't read the man page carefully enough. Feel free to edit.

Norman Ramsey 2010-07-11 05:46:09

ansaurus

tags:

views:

answers:

How to turn a hex string into an unsigned char array?

related questions