views:

164

answers:

4

I have an array that is like this:

unsigned char array[] = {'\xc0', '\x3f', '\x0e', '\x54', '\xe5', '\x20'};
unsigned char array2[6];

When I use memcpy:

memcpy(array2, array, 6);

And print both of them:

printf("%x %x %x %x %x %x", array[0],  // ... etc
printf("%x %x %x %x %x %x", array2[0], // ... etc

one prints like:

c0 3f e 54 e5 20

but the other one prints

ffffffc0 3f e 54 ffffffe5 20

what happened?

+1  A: 

%x format expects integer type. Try to use casting:

printf("%x %x %x %x %x %x", (int)array2[0], ...

Edit: Since there are new comments on my post, I want to add some information. Before calling the printf function, compiler generates code which pushes on the stack variable list of parameters (...). Compiler doesn't know anything about printf format codes, and pushes parameters according to their type. printf collects parameters from the stack according to formatting string. So, array[i] is pushed as char, and handled by printf as int. Therefore, it is always good idea to make casting, if parameter type doesn't match exactly format specification, working with printf/scanf functions.

Alex Farber
char is integer type. C99 std: 6.2.5 para 4: `There are five standard signed integer types, designated as signed char, shortint, int, long int, and long long int. (These and other types may bedesignated in several additional ways, as described in 6.7.2.)`
osgx
But a char may not be passed on the stack in four bytes, which is what `%x` will be expecting.
Mark B
the function call will promote the char's to int's (and bit fill the existing promotion)
KevinDTimm
Why would this make any difference? It's just making the same conversion that promotion for a varargs function parameter would make anyway only making it explicit.
Charles Bailey
@Osgx, @Mark B: %x expects an int type. a char will always by expanded to an int when passed as a parameter. It's the sign extension as it's expanded that's the issue.
James Curran
@James Curran: nitpick: `%x` expects and unsigned parameter, but integer promotions will _usually_ mean that `unsigned char`, `char` and `signed char` will all be promoted to `int` when passed to `printf`. (Only if an `int` isn't large enough to hold all values of an `unsigned char` would this not be the case - usually on those rare platforms where `int` and `char` have the same width.)
Charles Bailey
Still not correct, `char` parameters to a varargs function are first subject to integral promotions so `array[i]` is converted to `int` and the results of that conversion are passed to the function. It's not passed as a `char`.
Charles Bailey
@Charles: If I'm reading the c99 spec correctly, an `unsigned char` will be promoted to `unsigned int`, rather than `int`. especially as the spec states `The integer promotions preserve value including sign.`
Hasturkun
@Hasturkun: Preserves _values_, yes, but if an `int` can hold all of the (positive) values of an `unsigned char` (which is usually the case) then the type converted to will be `int`.
Charles Bailey
+3  A: 

The problem is not memcpy (unless your char type really is 32 bits, rather than 8), it looks more like integer sign extension while printing.

you may want to change your printf to explicitly use unsigned char conversion, ie.

printf("%hhx %hhx...", array2[0], array2[1],...);

As a guess, it's possible that your compiler/optimizer is handling array (whose size and contents are known at compile time) and array2 differently, pushing constant values onto the stack in the first place and erroneously pushing sign extended values in the second.

Hasturkun
+4  A: 

You should mask off the higher bits, since your chars will be extended to int size when calling a varargs function:

printf("%x %x %x %x %x %x", array[0] & 0xff,  // ..
unwind
%hhx is better.
osgx
KevinDTimm
`%hhx` is not (yet) C++, remember that C++ refers to the pre-C99 standard version for its `printf` contract.
Charles Bailey
@Charles Bailey, but there is no libc library for C++, and any C++ prog will use libc from C. So, in most recent (sorry, not ancient) libc hhx will be supported.
osgx
@osgx: Please check your standard, particularly 1.2 Normative references. In the C++ standard what is referred to (and available in C++) as the _Standard C Library_ is clauses 7 of ISO/IEC 9899:1990 and ISO/IEC 9899/Amd.1:1995 i.e. the C90 standard library.
Charles Bailey
@Charles Bailey, yes. But the library itself does not know, is it running as C90 lib or as C99 lib.
osgx
@osgx: I'm not sure I understand what you're driving at. If you are using a conforming C++ implementation you can only rely on having a C90 _Standard C Library_ available. You cannot count on being able to use C99 features.
Charles Bailey
+7  A: 

I've turned your code into a complete compilable example. I also added a third array of a 'normal' char which on my environment is signed.

#include <cstring>
#include <cstdio>

using std::memcpy;
using std::printf;

int main()
{

        unsigned char array[] = {'\xc0', '\x3f', '\x0e', '\x54', '\xe5', '\x20'};
        unsigned char array2[6];
        char array3[6];

        memcpy(array2, array, 6);
        memcpy(array3, array, 6);

        printf("%x %x %x %x %x %x\n", array[0], array[1], array[2], array[3], array[4], array[5]);
        printf("%x %x %x %x %x %x\n", array2[0], array2[1], array2[2], array2[3], array2[4], array2[5]);
        printf("%x %x %x %x %x %x\n", array3[0], array3[1], array3[2], array3[3], array3[4], array3[5]);

        return 0;
}

My results were what I expected.

c0 3f e 54 e5 20
c0 3f e 54 e5 20
ffffffc0 3f e 54 ffffffe5 20

As you can see, only when the array is of a signed char type do the 'extra' ff get appended. The reason is that when memcpy populates the array of signed char, the values with a high bit set now correspond to negative char values. When passed to printf the char are promoted to int types which effectively means a sign extension.

%x prints them in hexadecimal as though they were unsigned int, but as the argument was passed as int the behaviour is technically undefined. Typically on a two's complement machine the behaviour is the same as the standard signed to unsigned conversion which uses mod 2^N arithmetic (where N is the number of value bits in an unsigned int). As the value was only 'slightly' negative (coming from a narrow signed type), post conversion the value is close to the maximum possible unsigned int value, i.e. it has many leading 1's (in binary) or leading f in hex.

Charles Bailey
You always have to be careful checking the value of memory with a print style statement - much better to use the debugger
Martin Beckett