tags:

views:

1283

answers:

9

I'm learning C and I can't figure out one of the K&R exercises, the listing:

Exercise 2-3, Write the function htoi(s), which converts a string of hexadecimal digits (including an optional 0x or 0X) into its equivalent integer value. The allowable digits are 0 through 9, a through f and A through F.

I suppose I need to do some recursion here, I just guess that I don't know a lot about the numbering types, and their various conversion methodologies, etc.

Could someone give me some pointers on how best to understand it, I'm not looking for someone to hold my hand, but instead direct me to a means of proper understanding so I can write this in the most elegant form as possible, and not with printf("%x", skill);

Thanks :)

+9  A: 

Recursion is not necessary. You simply need to loop backwards over the string (i.e. starting in the units column), summing the single digit conversion times it's radix position multiplier. This is pseudocode and does not handle the optional 0x prefix (and is not checking for possibility of overflow):

long total = 0;
long multiplier = 1;
for (int i = string.length - 1; i >= 0 i--)
{
   digit = ConvertSingleHexDigittoInt(string[i]);
   total += digit * multiplier;
   multiplier *= 16;
}

I've left the easy implementation of ConvertSingleHexDigittoInt() to you :)

Mitch Wheat
would the downvoter please leave a comemnt as to why. Thanks.
Mitch Wheat
-1, because this not a valid C answer for a dedicated C beginner's question.
quinmars
I am not trying to give the poster the easy. I'm trying to encourage them to solve it themselves by giving them a helping hand. I don't believe that deserves a downvote.
Mitch Wheat
In the posters own words: Could someone give me some pointers on how best to understand it, I'm not looking for someone to hold my hand, but instead direct me to a means of proper understanding so I can write this in the most elegant form as possible"
Mitch Wheat
And you made it clear that your example was pseudocode, not C, so +1 to counteract the downvote.
LukeH
Maybe, maybe not. C can be though enough to learn. Confusing the asker with pseudo C code doesn't help imho much. I removed my downvote, however, still think, it confuse more then it helps.
quinmars
@quinmars, I'm with the other guys. If anything (as you'll see from my attempt at an answer) I wouldn't make the pseudocode look that much like C. I've been teaching programming to newbies for about 30 years, and if there's anything I have learned, it's that the exact code solution alone doesn't help much.
Charlie Martin
@Charlie Martin, right, I agree with you and also with Mitch Wheat, presenting the solution in the exact way doesn't teach him much. My concerns were that the pseudo C code (mixture of java and c++) is going to confuse the asker more then the actual question. But my first reaction was a bit harsh, hence I removed my downvote.
quinmars
+4  A: 
Charlie Martin
That looks the almost same as my original answer! ;)
Mitch Wheat
yeah, and then you went and gave the game away. You're no fun.
Charlie Martin
A: 

Yesterday I wrote a function like this. You can see my code below.

/* Converting a hex string to integer, assuming the heading 
   0x or 0X has already been removed and pch is not NULL */
int hex_str_to_int(const char* pch) {

    int value = 0;
    int digit = 0;

    for (; *pch; ++pch) {

        if (*pch >= '0' && *pch <= '9') {
            digit = (*pch - '0');
        } else if (*pch >= 'A' && *pch <= 'F') {
            digit = (*pch - 'A' + 10);
        } else if (*pch >= 'a' && *pch <= 'f') {
            digit = (*pch - 'a' + 10);
        } else {
            break;
        }

        // Check for integer overflow
        if ((value *= 16) < 0 || (value += digit) < 0) {
            return INT_MAX;
        }
    }

    return value;
}

Here is the testing code:

int main(void) {

    printf("%d %d\n", hex_str_to_int("0"), 0x0);
    printf("%d %d\n", hex_str_to_int("A"), 0xA);
    printf("%d %d\n", hex_str_to_int("10"), 0x10);
    printf("%d %d\n", hex_str_to_int("A1"), 0xA1);
    printf("%d %d\n", hex_str_to_int("AB"), 0xAB);
    printf("%d %d\n", hex_str_to_int("100"), 0x100);
    printf("%d %d\n", hex_str_to_int("1A2"), 0x1A2);
    printf("%d %d\n", hex_str_to_int("10A"), 0x10A);
    printf("%d %d\n", hex_str_to_int("7FFFFFF"), 0x7FFFFFF);
    printf("%d %d\n", hex_str_to_int("7FFFFFF1"), 0x7FFFFFF1);
    printf("%d %d\n", hex_str_to_int("7FFFFFF2"), 0x7FFFFFF2);
    printf("%d %d\n", hex_str_to_int("7FFFFFFE"), 0x7FFFFFFE);
    printf("%d %d\n", hex_str_to_int("7FFFFFFF"), 0x7FFFFFFF);
    printf("%d %d\n", hex_str_to_int("80000000"), 0x7FFFFFFF + 1);
    printf("%d %d\n", hex_str_to_int("80000001"), 0x7FFFFFFF + 2);

    printf("%d %d\n", hex_str_to_int("10AX"), 0x10A);   
    printf("%d %d\n", hex_str_to_int("203!"), 0x203);

    return 0;
}

It outputs the following values:

0 0
10 10
16 16
161 161
171 171
256 256
418 418
266 266
134217727 134217727
2147483633 2147483633
2147483634 2147483634
2147483646 2147483646
2147483647 2147483647
2147483647 -2147483648
2147483647 -2147483647
266 266
515 515
yinyueyouge
This is wrong. This treats 'a' as 0 instead of 10, and likewise up to treating 'f' as 5 instead of 15.
Adam Rosenfield
thanks for commenting. bug fixed.
yinyueyouge
This is wrong. This converts "100" to have the value 1, not 256. That is, your loop is over the hex digits from last to first while multiplying the value by 16 each time you add a digit.
RBerteig
The check for integer overflow is of limited value, and incorrect to boot. The string `"1"` is a valid, non-overflowed conversion, but overflows because `1 < '1'` which is the test in the second condition for overflow.
RBerteig
thanks for commenting. bug fixed.
yinyueyouge
A: 

A conventional approach converts from left to right. An accumulator is set to zero at the beginning, and multiplied by 16 before adding the equivalent value of each new digit to the loop.

For an htoi() function that expects hexidecimal digits with an optional leading 0x, begin by skipping past those characters if present. Directly checking the values of s[0] and s[1] is probably the clearest approach there.

If you know the digits are in ASCII, then you can use expressions like s[i] - '0' and s[i] - 'A' + 10 to convert the i-th digit to its integer value.

You probably want to fold the whole thing to one case for sanity.

Edit: Changed *s to s[i] for consistency with the observation that pointers are from the future from the point of view of this exercise.

Note that there are several other ways to convert the individual digits to values. For example, you could look them up in a vector of all digits (something like strchr("0123456789ABCDEF",s[i])), build a single lookup table indexed by character code with the value of each digit at each position (digitvalue[s[i]] after int digitvalue[256] has been suitably initialized), use a switch (s[i]) statement with a case label for each possible digit as suggested in another answer, or use the range checks and arithmetic as I suggest above. Something to think about is which to choose, and why. Notice that it may not be an obvious choice, and the best answer may be different if ASCII is not your character set of choice.

RBerteig
+3  A: 

Processing the string from left to right is simpler and arguably more readable for those comfortable with math. The strategy is realizing that, for example, 1234 = (((1 x 10) + 2) x 10 + 3) x 10 + 4

In other words, as you process each digit from left to right, multiply the previous total by the base, effectively "moving it left" one position, then add the new digit.

long decFromHexStr(const char *hexStr)
{
    int i;
    long decResult = 0;  // Decimal result

    for (i=0;  i < strlen(hexStr);  ++i)
    {
        decResult = 16 * decResult + decFromHexChar(hexStr[i]);
    }
    return decResult;
}

Experienced programmers would probably use a pointer to step through the string instead of treating it as an array:

long decFromHexStr(const char *pHex)
{
    long decResult = 0;

    while (*pHex != '\0')
    {
        decResult = 16 * decResult + decFromHexChar(*pHex++);
    }
    return decResult;
}

Since you're learning, it's worth studying the coding style and deciding whether you find it helpful or not, so you'll build good habits early.

Have fun!

Adam Liss
I just find the r-l version appealing because the radix step matches the definition, so I can do r *= base.
Charlie Martin
You could also split the l-r method into two steps: result *= base; result += nextDigit, which sort of mimics the behavior you see when entering numbers into a calculator. I suppose it's a matter of taste: 6 of one, C>>1 of another. :-)
Adam Liss
A: 

What does a hexadecimal number actually mean? Let's take 15FA. It means

1 * 16^3 + 5 * 16^2 + 15 * 16^1 + 10 * 16^0

Note that A represents ten, B eleven and so on up to F which represents fifteen. Also 16^0 is equal to 1.

So all we need to do is calculate the value of the above expression! The simplest way is probably to do it in this order:

10 * 1
15 * 16
5  * 256   //256  = 16 * 16
1  * 4096  //4096 = 16 * 16 * 16

This can continue further if there are more digits. All you really need is a loop and few variables.

There is another method of doing it which is explained by factorising the above expression like this:

((1 * 16 + 5) * 16 + 15) * 16 + 10

If you wish, try each of these methods.

More advanced information:

Basically, computers use base 2 (also called binary) for all their numbers and calculations. Even the string "1A6DC0" is encoded with 1s and 0s, which eventually get displayed on the screen as letters and numbers.

Sometimes you can take advantage of the fact that computers use binary, but usually you don't need to think about it.

For instance, when you do

x = (11 + y) * 6;

you don't need to worry that 11 and 6 will be represented as a series of high and low voltages at some stage. It just works as you expect. Converting between decimal (the number system we use) to binary and back is a simple process that computers can do easily, and so they do this for us automatically to make our work easier.

However, when converting between hexadecimal and binary, there is a shortcut. Since four binary digits are identical to a single hex digit, you can simply convert each hex digit to binary individually, then string them together.

For instance, 15FA would expand like this:

1 -> 0001
5 -> 0101
F -> 1111
A -> 1010
15FA -> 0001 0101 1111 1010

Note that this generally can't be done directly, and usually involves logical-or and bit shifts (| and <<). Fun stuff.

Artelius
Side note: the factorised expression (left-to-right) method requires much fewer multiplication steps, so it's more efficient.
Artelius
+1  A: 

Hi, I'm probably not making a great contribution, there are good answers above. But I'll give it a try.

As others did before me, I'm leaving some functionality for you to implement.

int htoi(const char* x)
{

        unsigned int current_position;/*current position is to be defined*/
        int prefixed=0;                                                         
        int dec=0;
        char* y = x;

        if (x && x+1 && (*(x+1)=='x' || *(x+1)=='X')){  /*Is 0x or 0X prefix present?*/
                prefixed= PREFIXED;             
        }

        if (prefixed) y+=2; /*Jumps over 0x or 0X*/     


        while (*y){
                /*getPos(const char*) and singleHexToDec(const char*,unsigned int) functions to be implemented*/
                current_position=getPos(y);
                dec+=singleHexToDec(y,current_position); 
        }
        return dec;
}
Tom
A: 

I can't use pointers, the k&r hasn't covered it yet.

+1  A: 

Hi, try to explain with my rude english :(

My code (assume that all inputs are corrects. Avoid defensive programing)

#include <stdio.h>


enum { SZ = 11 };

unsigned int htoi(const char *s);


int main()
{

  char buff[SZ];  //Max 11 char: 0x XX XX XX XX '\0' (2 + 8 + 1)

  while(fscanf(stdin, "%s", buff) != EOF)
    printf("%X\n", htoi(buff) ); 

  return 0;
}


unsigned int htoi(const char *s)
{
  unsigned int i, r = 0;

  for(i = (s[1] == 'x') ? 2 : 0; s[i] != '\0'; i++)
    r = ( r << 4 ) +  ( (s[i] > '9') ? 0x9 : 0x0 ) + ( s[i] & 0xF );

  return r;
}

Ok, first of all, assign r = 0. Then, when we start for-bucle, we give an init value to index variable i. We have to check if string has 0x format or not. We only need to check position 1 to know if we are treating an input string with 0x format or without it.

Now, we have an index pointing to first correct character! For each iteraion we displace 4 bits to the left. We gain 4 zeros. A perfect gap to add a new hex digit! Example:

Input: 0xBE1234

Is s[1] == 'x' ? true then i = 2;
r = 0;

iter 1: r = 0x0; r = 0x0; r = 0xB;
iter 2: r = 0xB; r = 0xB0; r = 0xBE;
iter 3: r = 0xBE; r = 0xBE0; r = 0xBE1;
iter 4: r = 0xBE1; r = 0xBE10; r = 0xBE12;
iter 5: r = 0xBE12; r = 0xBE120; r = 0xBE123;
iter 6: r = 0xBE123; r = 0xBE1230; r = 0xBE1234

May be this is a bit complicate:

 r = ( r << 4 ) + ( (s[i] > '9') ? 0x9 : 0x0 ) + ( s[i] & 0xF );

First of all, we displace 4 bits, same as multiplication per 16 but more efficient. Then, we look if we have an ASCII character bigger than '9'. If it's true, we are working with A, B, C, D, E, F or a, b, c, d, e, f. Remember, we assume that we have a correct input. Ok, now take a look to ASCII table:

A = 0100 0001  -  a = 0110 0001
...
F = 0100 0110  -  f = 0110 0110

but we want something like this:

A = 0000 1010  -  a = 0000 1010
...
F = 0000 1111  -  f = 0000 1111

How we do it? After displacement, we clear 4 most significant bit with mask s[i] & 0xF:

s[2] == 'B' == 0100 0010
s[2] & 0xF == 0000 0010

and add 9 for adapt to an integer value ( only in case that s[i] in { 'A'...'F', 'a' ... 'f' } )

s[2] & 0xF + 0x9 = 0000 0010 + 0000 1001 = 0000 1011 (0xB)

Finally, we add to displaced r value and assign to r. Execution sequence for second iteration (s[3]):

r == 0xB, s[3] == 'E' == 0100 0101 (start iter 2)
(r << 4) == 0xB0, s[3] == 'E' == 0100 0101 (displacement r << 4 )
(r << 4) == 0xB0, (s[3] & 0xF + 0x9) == 0000 1110 == 0xE (clear most significant bits of s[3] and add 0x9)
r = (r << 4) + ( s[3] & 0xF + 0x9 ) == 0xBE == 1011 1110 (add all and assign to r)

What's happen if we have a number character like s[4]?

s[4] == '1' == 0011 0001
s[4] & 0xF == 0000 0001

Displacement r four positions, add 0 (nothing), add result of logic operation s[i] & 0xF and finally, assign to r.

r == 0xBE, s[4] == '1' == 0011 0001 (start iter 3)
(r << 4) == 0xBE0, s[4] == '1' == 0011 0001 (displacement r << 4 )
(r << 4) == 0xBE0, (s[4] & 0xF + 0x0) == 0000 0001 (clear most significant bits of s[4] and add 0)
r = (r << 4) + s[4] & 0xF == 0xBE1 == 1011 1110 0001 (add all and assign)

Remember, we shift 4 so we don't mesh digit bits because we are adding less significant bits with a gap of four zeros.

PD: I promise improve my english for explain better, sorry.

saxi