views:

221

answers:

5

For example, I have a cstring "E8 48 D8 FF FF 8B 0D" (including spaces) which needs to be converted into the equivalent unsigned char array {0xE8,0x48,0xD8,0xFF,0xFF,0x8B,0x0D}. What's an efficient way to do this? Thanks!

EDIT: I can't used the std library... so consider this a C question. I'm sorry!

+3  A: 

This answers the original question, which asked for a C++ solution.

You can use an istringstream with the hex manipulator:

std::string hex_chars("E8 48 D8 FF FF 8B 0D");

std::istringstream hex_chars_stream(hex_chars);
std::vector<unsigned char> bytes;

unsigned int c;
while (hex_chars_stream >> std::hex >> c)
{
    bytes.push_back(c);
}

Note that c must be an int (or long, or some other integer type), not a char; if it is a char (or unsigned char), the wrong >> overload will be called and individual characters will be extracted from the string, not hexadecimal integer strings.

Additional error checking to ensure that the extracted value fits within a char would be a good idea.

James McNellis
+1 and deleting my equivalent (but not as good) answer.
Billy ONeal
Because I cannot give two correct answers, I went ahead and upvoted this one, as this definitely is a great solution for C++ users!
Gbps
+5  A: 
  • Iterate through all the characters.
    • If you have a hex digit, the number is (ch > 'A')? (ch - 'A' + 10): (ch - '0').
      • Left shift your accumulator by four bits and add (or OR) in the new digit.
    • If you have a space, and the previous character was not a space, then append your current accumulator value to the array and reset the accumulator back to zero.
Ben Voigt
+1: This is probably the most straightforward and simple way to do it.
James McNellis
That's basically what I did, except for using switch instead of ternary test. Depending on compiler and processor architecture one or the other may be faster. But you should also test every character is in range 0-9A-F, and it makes testing the same thing two times.
kriss
@kriss: It's all in the assumptions. You assume that there must be exactly two hex digits and one space between each value, mine allows omission of a leading zero or multiple spaces, but assumes that there are no other classes of characters in the string. If you can't assume that, I'd probably choose to do validation separately, by testing `if (s[strspn(s, " 0123456789ABCDEF")]) /* error */;` Sure, it's another pass on the string, but so much cleaner. Or avoid the second pass over the string by using `isspace` and `isxdigit` on each character, which uses a lookup table for speed.
Ben Voigt
@Ben: Looping around switches is not really an issue, I do not really take it as a difference. I choosed to assume there was exactly two hex char in input, because if you allow more than that you should also check range for values. And what about allowing negativer numbers, we would have to manage sign, etc. switch *is* a kind of lookup table... (and another fast conversion method would be to really use one implemented as an array).
kriss
The problem specified that all inputs were unsigned. The problem didn't specify that there would always be zeros padding to exactly two digits (e.g. all of these fit in a `char`: `0xA`, `0x0A`, `0x000A`) or just one space, although these assumptions were true on the sample input.
Ben Voigt
A: 

The old C way, do it by hand ;-) (there is many shorter ways, but I'm not golfing, I'm going for run-time).

enum { NBBYTES = 7 };
char res[NBBYTES+1];
const char * c = "E8 48 D8 FF FF 8B 0D";
const char * p = c;
int i = 0;

for (i = 0; i < NBBYTES; i++){
    switch (*p){
    case '0': case '1': case '2': case '3': case '4':
    case '5': case '6': case '7': case '8': case '9':
      res[i] = *p - '0';
    break;
    case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
      res[i] = *p - 'A' + 10;
    break;
   default:
     // parse error, throw exception
     ;
   }
   p++;
   switch (*p){
   case '0': case '1': case '2': case '3': case '4':
   case '5': case '6': case '7': case '8': case '9':
      res[i] = res[i]*16 + *p - '0';
   break;
   case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
      res[i] = res[i]*16 + *p - 'A' + 10;
   break;
   default:
      // parse error, throw exception
      ;
   }
   p++;
   if (*p == 0) { continue; }
   if (*p == ' ') { p++; continue; }
   // parse error, throw exception
}

// let's show the result, C style IO, just cout if you want C++
for (i = 0 ; i < 7; i++){
   printf("%2.2x ", 0xFF & res[i]);
}
printf("\n");

Now another one that allow for any number of digit between numbers, any number of spaces to separate them, including leading or trailing spaces (Ben's specs):

#include <stdio.h>
#include <stdlib.h>

int main(){
    enum { NBBYTES = 7 };
    char res[NBBYTES];
    const char * c = "E8 48 D8 FF FF 8B 0D";
    const char * p = c;
    int i = -1;

    res[i] = 0;
    char ch = ' ';
    while (ch && i < NBBYTES){
       switch (ch){
       case '0': case '1': case '2': case '3': case '4':
       case '5': case '6': case '7': case '8': case '9':
          ch -= '0' + 10 - 'A';
       case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
          ch -= 'A' - 10;
          res[i] = res[i]*16 + ch;
          break;
       case ' ':
         if (*p != ' ') {
             if (i == NBBYTES-1){
                 printf("parse error, throw exception\n");
                 exit(-1);
            }
            res[++i] = 0;
         }
         break;
       case 0:
         break;
       default:
         printf("parse error, throw exception\n");
         exit(-1);
       }
       ch = *(p++);
    }
    if (i != NBBYTES-1){
        printf("parse error, throw exception\n");
        exit(-1);
    }

   for (i = 0 ; i < 7; i++){
      printf("%2.2x ", 0xFF & res[i]);
   }
   printf("\n");
}

No, it's not really obfuscated... but well, it looks like it is.

kriss
Are we allowed to say 'Ick!'? (If only because the code will 'throw exception' on the last loop, because there are only 6 spaces in the string, not 7 as the code requires.)
Jonathan Leffler
@Jonathan: not any more... I could also have added a space to input. The old separators vs terminators debate.
kriss
your little fix doesn't help... `*p != ' '` on the terminating NUL and it doesn't matter what you logical-or that with.
Ben Voigt
Opps, I did err again. You should like the new fix better :-)
kriss
Validity check is still flaky.
Ben Voigt
@Ben: be patient, I didn't posted any change yet...
kriss
A: 

For a pure C implementation I think you can persuade sscanf(3) to do what you what. I believe this should be portable (including the slightly dodgy type coercion to appease the compiler) so long as your input string is only ever going to contain two-character hex values.

#include <stdio.h>
#include <stdlib.h>


char hex[] = "E8 48 D8 FF FF 8B 0D";
char *p;
int cnt = (strlen(hex) + 1) / 3; // Whether or not there's a trailing space
unsigned char *result = (unsigned char *)malloc(cnt), *r;
unsigned char c;

for (p = hex, r = result; *p; p += 3) {
    if (sscanf(p, "%02X", (unsigned int *)&c) != 1) {
        break; // Didn't parse as expected
    }
    *r++ = c;
}
bjg
Declare `c` as `unsigned int`, otherwise you could overwrite other local variables (or worse yet, your return address).
Ben Voigt
But generally scanf is going to take longer to figure out the format code than my entire answer will, and the question did ask for an *efficient* way.
Ben Voigt
@Ben Voigt. Yes but does efficient mean run-time or programmer-time? '-) Anyway thanks for pointing out that I should have made `c` an `insigned int` and coerced that into the `result` array.
bjg
+1  A: 

You'll never convince me that this operation is a performance bottleneck. The efficient way is to make good use of your time by using the standard C library:

static unsigned char gethex(const char *s, char **endptr) {
  assert(s);
  while (isspace(*s)) s++;
  assert(*s);
  return strtoul(s, endptr, 16);
}

unsigned char *convert(const char *s, int *length) {
  unsigned char *answer = malloc((strlen(s) + 1) / 3);
  unsigned char *p;
  for (p = answer; *s; p++)
    *p = gethex(s, (char **)&s);
  *length = p - answer;
  return answer;
}

Compiled and tested. Works on your example.

Norman Ramsey
I chose this as the answer because it simply provided a working example. Thanks!
Gbps
OTOH, buffer overflow on "A B C D E F 1 2 3 4 5 6 7 8 9".
Ben Voigt
Much simpler: `for (i=0; i<max i++) a[i]=strtol(s, ` The point being, your `gethex` function is totally redundant. `strtol` skips leading whitespace itself. If you want to be more strict about not accepting strings that don't match the pattern, you could use sscanf to control field width and measure match lengths.
R..
@R: great point about strtoul---I didn't read the man page carefully enough. Feel free to edit.
Norman Ramsey