views:

500

answers:

8

Hey everyone, just a quick thing, I have the hex to integer working, but I need to get the numbers lowercase. Here's what I have, any ideas to get to get the A thru F case insensitive?

int htoi(char f[]) {
    int  z, n;
    n = 0;

    for (z = 0; f[z] >= '0' && f[z] <= 'F'; ++z) 
        if (f[z] >= 'A' && f[z] <= 'F')
            n = 10 + 16 * n + (f[z] - 'A');
        else
            n = 16 * n + (f[z] - '0');
}

Probably just a small thing, but I'd like to include a-f and A-F. Thanks for your help!

+2  A: 

Replace all f[z] with a dedicated variable. Assign that variable with toupper(f[z])

Arkadiy
A: 

Two options:

Convert to upper case before you do your scan.

Add a second if in the four loop that handles lower case.

stonemetal
+4  A: 

Create another function which turns a hex digit to an integer:

int hex_digit_to_integer(char digit) {
    if (digit >= 'A' && digit <= 'F') {
        return digit - 'A' + 10;
    } else if (digit >= 'a' && digit <= 'f') {
        return digit - 'a' + 10;
    } else if (digit >= '0' && digit <= '9') {
        return digit - '0';
    }

    return -1; // Bad input.
}

Notice how it handles four cases: * digit is an upper-case letter A..F, * digit is a lower-case letter a..f, * digit is a decimal digit 0..9, and * digit is none of the above.

Now use the new function in your original function:

int htoi(char f[]) {
    int z, n;
    n = 0;

    /* Loop until we get something which isn't a digit (hex_digit_to_integer returns something < 0). */
    for (z=0; hex_digit_to_integer(f[z]) >= 0; ++z) {
        n = 16 * n + hex_digit_to_integer(f[z]);
    }
}

Notice how much cleaner the new function looks?

If you're adventurous, you can use this magic function (which doesn't handle bad input, so you need to check that beforehand):

int hex_digit_to_integer(char digit) {
    return digit - (digit & 64 ? 55 : 48) & 15;
}
strager
I personally prefer the `toupper()` / `tolower()` solutions, but +1 for abstractions.
Chris Lutz
@Lutz, I thought the same thing, but I thought refactoring like this would be more useful. Throwing `tolower` into the mix may be a bit too much. ;P
strager
+7  A: 

If you're doing this to learn how to do it, ignore this post. If you're using this function because you need to convert a string of hex numbers to an int, you should take a walk in your standard library. The standard function strtol() converts a string to a long, which can be cast down to an int (or an unsigned int while were at it). The third argument is the base to convert to - in this case, you would want base 16 for hexadecimal. Also, if given base 0, it will assume hex if the string begins with 0x, octal if it begins with 0, and decimal otherwise. It's a very useful function.


EDIT: Just noticed this, but while we're here, it's worth mentioning that you should generally not use an int to index arrays. The C standard defines a type, called size_t, which is designed to store array indices. It is generally an unsigned int or unsigned long or something, but is guaranteed to be big enough to store any array or pointer offset you can use.

The problem with using just an int is that, theoretically, maybe, someday, someone could pass a string longer than INT_MAX, and then your int will overflow, probably wrap around, and start reading memory it probably shouldn't because it's using a negative index. This is highly unlikely, especially for a function like this, because the int value you return will overflow long before your int counter overflows, but it is an important thing to keep in mind.

To be technically correct, you should only use size_t type variables to index arrays, or at least only use unsigned types, unless you really want to try to access negative elements (which is usually a bad idea unless you know what you're doing). However, it's not a huge issue here.

Chris Lutz
`sscanf` can do the same thing, and more. =] +1 though.
strager
`sscanf()` is also overkill here. But yes, I believe `sscanf()` will actually be calling `strtol()` under-the-hood in some cases. And `sscanf()` is a good function, except for it's lack of bounds-checking. =(
Chris Lutz
@Lutz, Bounds-checking? You mean for `%s` and things? You can specify a maximum width (in the string or as an arg) of the captured string.
strager
I meant for the string you're scanning. I'm working on a string library (just for fun, maybe it'll be high-quality some day, who knows) and want to implement some `sscanf()` -like functions for my string type, but my string type can contain arbitrary binary data, and is not guaranteed to be nul-terminated. So either I roll my own `sscanf()` function, or I use non-standard versions. Or I'm horribly unsafe (which is how it is currently for lack of a better solution - not planning to release it with that in there).
Chris Lutz
You have bigger problems than using the wrong data type if your strings are longer than INT_MAX :-) I'm reminded of a lecture once where the lecturer said the sun would die in about 5 billion years. An audience member jumped up in fear and asked him to repeat that. The lecturer did so and the audience member said "Thank God, I though you said five *million*".
paxdiablo
@Pax - This is true, and I said as much, but it's still a technicality. It might be arguably "cleaner" to use pointer arithmetic.
Chris Lutz
A: 

Try this instead:

int htoi (char f[]) {
    int  z, n;
    n = 0;
    for (z = 0; f[z] != '\0'; ++z) { 
        if (f[z] >= '0' && f[z] <= '9') {
            n = n * 16 + f[z] - '0';
        } else {
            if (f[z] >= 'A' && f[z] <= 'F') {
                n = n * 16 + f[z] - 'A' + 10;
            } else {
                if (f[z] >= 'a' && f[z] <= 'f') {
                    n = n * 16 + f[z] - 'a' + 10;
                } else {
                    break;
                }
            }
        }
    }
    return n;
}

It still treats the input the same way as yours (I'd tend to use pointers but they are sometimes hard to understand by a beginner) but introduces three separate cases, 0-9, A-F and a-f, treating each appropriately.

Your original code would actually allow erroneous characters (the six between '9' and 'A') and produce incorrect results based on them.

Note that this new code only normally terminates the loop at end of string. Finding an invalid hex character will break out of the loop, functionally identical to your terminating condition.

paxdiablo
You know you can change `else { if { } }` to `else if { }` right?
Chris Lutz
Ahhhh pyramid!
strager
I'm aware of that, @Chris. I still prefer braces on them all since it's so much easier to add code later without having to worry about forgetting to add the braces. It's just my coding style.
paxdiablo
Well, I wish you and your style the best of luck as long as I don't have to collaborate with you on anything. :P
Chris Lutz
The problem isn't bracing, it's indention level. The more indention levels there are, the more difficult it is to follow the code (in general) due to more (perceived) code paths.
strager
I see no problem with three levels of indentation. And, if there were more, I wouldn't be removing braces, I'd be moving some of the lower levels into separate functions and calling them.
paxdiablo
You seriously think you might want to add code later, in one of those else blocks?
James Antill
@James, no, probably not those ones. But I still tend to plan for it anyway. Consistency makes my life easier. Who's to say I won't add some debug statements or a logging call later on? In any case, the braces are irrelevant to the question. You may as well comment on the number of spaces for indentation or putting the opening brace on the same line as the if statement.
paxdiablo
+1  A: 

Here is some code from the NPS NSRL Bloom package:

static int *hexcharvals = 0;

/** Initialization function is used solely for hex output
 */
static void nsrl_bloom_init()
{
    if(hexcharvals==0){
        /* Need to initialize this */
        int i;
        hexcharvals = calloc(sizeof(int),256);
        for(i=0;i<10;i++){
            hexcharvals['0'+i] = i;
        }
        for(i=10;i<16;i++){
            hexcharvals['A'+i-10] = i;
            hexcharvals['a'+i-10] = i;
        }
    }
}

/**
 * Convert a hex representation to binary, and return
 * the number of bits converted.
 * @param binbuf output buffer
 * @param binbuf_size size of output buffer in bytes.
 * @param hex    input buffer (in hex)
 */
int nsrl_hex2bin(unsigned char *binbuf,size_t binbuf_size,const char *hex)
{
    int bits = 0;
    if(hexcharvals==0) nsrl_bloom_init();
    while(hex[0] && hex[1] && binbuf_size>0){
        *binbuf++ = ((hexcharvals[(unsigned char)hex[0]]<<4) |
                     hexcharvals[(unsigned char)hex[1]]);
        hex  += 2;
        bits += 8;
        binbuf_size -= 1;
    }
    return bits;
}

This code is designed to be super-fast, handle both upper-case and lower-case hex, and handle hex strings of any length. The function nsrl_hex2bin() takes a binary buffer, the size of that buffer, and the hex string you want to convert. It returns the number of bits that actually got converted.

Oh, if you want just an integer, then you can multiply out the bytes (for endian-independent code), or just do a cast (for endian-dependent code).

vy32
I don't see the point of trying to micro-optimize a function when the exact same functionality is provided (in assembly, probably optimized more than you could ever imagine) by the C standard library.
Chris Lutz
If you mean sscanf() that has a lot of overhead, just like *printf()
James Antill
I meant `strtol()` which has none of the overhead of `*scanf()` or `*printf()` - not that this overhead is really significant most of the time. If format strings are the bottleneck of your application, you're too good at your job.
Chris Lutz
Chris - the C standard library doesn't convert a blocks of data. How would you convert a 160-bit SHA1 code to a binary structure with the C library code?
vy32
+1  A: 

You could try sscanf instead:

#include <stdio.h>

...

//NOTE: buffer overflow if f is not terminated with \0 !!
int htoi(char f[]){
  int intval = -1;
  if (EOF == sscanf(f, "%x", &intval))
    return -1; //error
  return intval;
}
Kip
A: 

Use strtol() please. This is the standard C90 function and much more powerful than most of naive implementations. It also supports seamless conversion from dec (no prefix), hex (0x) and oct (starting with 0).