ansaurus

Question

Unexpected behavior when printing 4-byte integer byte by byte.

Answer 1

+5 A:

%d is for signed integers

%u is for unsigned integers

Edit:

Please modify your program as follows to see how your input is really interpreted:

#include <stdio.h>
int main()
{
 unsigned int c ; 
 unsigned char* cptr  = (unsigned char*)&c ;
 while(1)
 {
  scanf("%d",&c) ;
  printf("Signed value: %d\n",c);
  printf("Unsigned value: %u\n",c);
  printf("%u.%u.%u.%u \n",*cptr, *(cptr+1), *(cptr+2), *(cptr+3) );
 }
}

What happens when you supply a number greater than INT_MAX is the left-most bit is 1. This indicates that it is a signed integer with a negative value. The number is then interpreted as it's two's complement

hobodave 2010-01-09 07:55:45

Also you might want to consider some research about 2's complement.

Erkan Haspulat 2010-01-09 08:03:17

It could be interpreted as two's complement, or ones' complement, or whatever the underlying encoding is. Technically, `scanf()` could have done anything, since the behavior when the input doesn't fit in the data type is undefined.

Alok 2010-01-09 23:49:10

@Alok: you're right. I just went with the likely result.

hobodave 2010-01-10 03:53:42

Answer 2

+9 A:

You are coding 'sinfully' (making a number of mistakes which will hurt you sooner or later - mostly sooner). First off, you are assuming that the integer is of the correct endian-ness. On some machines, you will be wrong - either on Intel machines or on PowerPC or SPARC machines.

In general, you should show the actual results you get rather than just saying that you get the wrong result; you should also show the expected result. That helps people debug your expectations.

Here's my modified version of your code - instead of requesting input, it simply assumes the value you specified.

#include <stdio.h>
int main(void)
{
    unsigned int c = 2249459722;
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
    return(0);
}

When compiled on my Mac (Intel, little-endian), the output is:

Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134

When compiled on my Sun (SPARC, big-endian), the output is:

Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 134.20.8.10

(Using GCC 4.4.2 on the SPARC, I get a warning:

xx.c:4: warning: this decimal constant is unsigned only in ISO C90

Using GCC 4.2.1 on Mac - with lots of warnings enabled (gcc -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Werror) - I don't get that warning, which is interesting.) I can remove that by adding a U suffix to the integer constant.

Another way of looking at the problems is illustrated with the following code and the extremely fussy compiler settings shown above:

#include <stdio.h>

static void print_value(unsigned int c)
{
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
}

int main(void)
{
    const char str[] = "2249459722";
    unsigned int c = 2249459722;

    printf("Direct operations:\n");
    print_value(c);

    printf("Indirect operations:\n");
    if (sscanf("2249559722", "%d", &c) != 0)
        printf("Conversion failed for %s\n", str);
    else
        print_value(c);
    return(0);
}

This fails to compile (because of the -Werror setting) with the message:

cc1: warnings being treated as errors
xx.c: In function ‘main’:
xx.c:20: warning: format ‘%d’ expects type ‘int *’, but argument 3 has type ‘unsigned int *’

Remove the -Werror setting and it compiles, but then shows the next problem that you have - the one of not checking for error indications from functions that can fail:

Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations:
Conversion failed for 2249459722

Basically, the sscanf() function reports that it failed to convert the string to a signed integer (because the value is too large to fit - see the warning from GCC 4.4.2), but your code was not checking for the error return from sscanf(), so you were using whatever value happened to be left in c at the time.

So, there are multiple problems with your code:

It assumes a particular architecture (little-endian rather than recognizing that big-endian also exists).
It doesn't compile cleanly when using a compiler with lots of warnings enabled - for good reason.
It doesn't check that functions that can fail actually succeeded.

Alok's Comment

Yes, the test on sscanf() is wrong. That's why you have code reviews, and also why it helps to post the code you are testing.

I'm now a bit puzzled - getting consistent behaviour that I can't immediately explain. With the obvious revision (testing on MacOS X 10.6.2, GCC 4.2.1, 32-bit and 64-bit compilations), I get one not very sane answer. When I rewrite more modularly, I get a sane answer.

+ cat yy.c
#include <stdio.h>

static void print_value(unsigned int c)
{
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
}

int main(void)
{
    const char str[] = "2249459722";
    unsigned int c = 2249459722;

    printf("Direct operations:\n");
    print_value(c);

    printf("Indirect operations:\n");
    if (sscanf("2249559722", "%d", &c) != 1)
        printf("Conversion failed for %s\n", str);
    else
        print_value(c);
    return(0);
}


+ gcc -o yy.32 -m32 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes yy.c
yy.c: In function ‘main’:
yy.c:20: warning: format ‘%d’ expects type ‘int *’, but argument 3 has type ‘unsigned int *’


+ ./yy.32
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations:
Integer value:  2249559722
Integer value:  0x86158EAA
Dotted decimal: 170.142.21.134

I do not have a good explanation for the value 170.142.21.134; but it is consistent on my machine, at the moment.

+ gcc -o yy.64 -m64 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes yy.c
yy.c: In function ‘main’:
yy.c:20: warning: format ‘%d’ expects type ‘int *’, but argument 3 has type ‘unsigned int *’


+ ./yy.64
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations:
Integer value:  2249559722
Integer value:  0x86158EAA
Dotted decimal: 170.142.21.134

Same value - even in 64-bit instead of 32-bit. Maybe the problem is that I'm trying to explain undefined behaviour, which is more or less by definition unexplainable (inexplicable).

+ cat xx.c
#include <stdio.h>

static void print_value(unsigned int c)
{
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
}

static void scan_value(const char *str, const char *fmt, const char *tag)
{
    unsigned int c;
    printf("Indirect operations (%s):\n", tag);
    fmt = "%d";
    if (sscanf(str, fmt, &c) != 1)
        printf("Conversion failed for %s (format %s \"%s\")\n", str, tag, fmt);
    else
        print_value(c);
}

int main(void)
{
    const char str[] = "2249459722";
    unsigned int c = 2249459722U;

    printf("Direct operations:\n");
    print_value(c);
    scan_value(str, "%d", "signed");
    scan_value(str, "%u", "unsigned");

    return(0);
}

Using the function argument like this means GCC cannot spot the bogus format any more.

+ gcc -o xx.32 -m32 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes xx.c


+ ./xx.32
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (signed):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (unsigned):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134

The results are consistent here.

+ gcc -o xx.64 -m64 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes xx.c


+ ./xx.64
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (signed):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (unsigned):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134

And these are the same as the 32-bit case. I'm officially bemused. The main observations remain accurate - be careful, heed compiler warnings (and elicit compiler warnings), and don't assume that "all the world runs on Intel chips" (it used to be "don't assume that all the world is a VAX", once upon a long time ago!).

Jonathan Leffler 2010-01-09 08:06:54

Good point. I thought to mention that in my edit, but I didn't think it would be understood.

hobodave 2010-01-09 08:08:22

The cause for the "unsigned only in C90" warning is that the rules about what the type of an integer literal in C is between C90 and C99. In C90, the types "tried" are `int`, `long int` and `unsigned long int`. In C99, the types are `int`, `long int` and `long long int`. The constant `2249459722` is of type `unsigned long int` in C90 and `long long int` in C99 on your machine. Adding `U` makes it unsigned everywhere.

Alok 2010-01-09 23:59:07

Technically, the `printf()` functions can fail and should be checked too. However, it is unusual to do so in example code - whereas checking inputs such as the `sscanf()` (or `scanf()` in the original) is crucial, even in example code.

Jonathan Leffler 2010-01-10 00:00:36

Surely you meant `!= 1` and not `!= 0` in your `scanf()` call?

Alok 2010-01-10 00:03:17

@Jonathan: Can you tell me the size of `int`, `long int` and `long long int` on your Solaris machine? Just curious.

Alok 2010-01-10 00:40:49

@Alok: under 'gcc' and 'gcc -m32', it is 4/4/8; under 'gcc -m64', it is 4/8/8. I think that the makefile in the directory where I built the test program uses '-m64' by default.

Jonathan Leffler 2010-01-10 00:53:06

@Jonathan: You will hate me for it, but in your `yy.c`, your `sscanf` call scans `"2249559722"`, not `str`, which is `"2249459722"`. See the difference? 2249 **4** 59722 vs 2249 **5** 59722. Now? :-)

Alok 2010-01-10 04:47:35

@Alok: well spotted - and absolutely no hate involved! Indeed, it is a relief to know there's a rational reason for the issue. I'm not sure why the scanf() is not generating an error, though. I don't plan to amend my answer at the moment. And I did remember noting at some point that I was not using the string I'd created...but I didn't fix it. Oh well, ç'est la vie, comme on dit.

Jonathan Leffler 2010-01-10 07:52:10

thanks for the answer jonathan and also for the book.. ;)

Neeraj 2010-01-10 08:18:19

`scanf()` doesn't generate an error in case of overflow. The behavior is undefined. So, `int c; sscanf("123456789012345678901234567890", "%d", ` will (may) return 1, and `c` may contain anything.

Alok 2010-01-10 16:24:24

@Alok: the dark corners of the C standard are indeed dark and twisty. Even the spec for strtol() et al is hard to use: if the 'subject string' value is too big, you get the appropriate min/max integer value (LONG_MIN for negative, LONG_MAX for positive) and errno set to ERANGE. I'd really rather have: `extern errno_t str_to_long(const char *str, char *end, int base, long *result);` with the return value being 0 for OK and ERANGE or other values on error (`errno_t` is defined in TR 24731-1 - see http://stackoverflow.com/questions/372980/ for more on that standard technical report).

Jonathan Leffler 2010-01-10 19:08:19

Answer 3

+1 A:

Alok 2010-01-09 23:45:09

Answer 4

A:

The correct endianness-safe way to write this is

printf("Dotted decimal: %u.%u.%u.%u \n", (c >> 24) & 0xff, (c >> 16) & 0xff, (c >> 8) & 0xff, (c >> 0) & 0xff);

starblue 2010-01-10 17:27:23

ansaurus

tags:

views:

answers:

Unexpected behavior when printing 4-byte integer byte by byte.

Edit:

Alok's Comment

related questions