views:

6829

answers:

10

I'm trying to convert a struct to a char array to send over the network. However, I get some weird output from the char array when I do.

#include <stdio.h>

struct x
{
   int x;
} __attribute__((packed));


int main()
{
   struct x a;
   a.x=127;
   char *b = (char *)&a;
   int i;
   for (i=0; i<4; i++)
      printf("%02x ", b[i]);
   printf("\n");
   for (i=0; i<4; i++)
      printf("%d ", b[i]);
   printf("\n");
   return 0;
}

Here is the output for various values of a.x (on an X86 using gcc):
127:
7f 00 00 00
127 0 0 0

128:
ffffff80 00 00 00
-128 0 0 0

255:
ffffffff 00 00 00
-1 0 0 0

256:
00 01 00 00
0 1 0 0

I understand the values for 127 and 256, but why do the numbers change when going to 128? Why wouldn't it just be: 80 00 00 00 128 0 0 0

Am I forgetting to do something in the conversion process or am I forgetting something about integer representation?

*Note: This is just a small test program. In a real program I have more in the struct, better variable names, and I convert to little-endian.
*Edit: formatting

+8  A: 

char is a signed type; so with two's complement, 0x80 is -128 for an 8-bit integer (i.e. a byte)

Rowland Shaw
Love to know why this got downvoted
Rowland Shaw
+1  A: 

char is a signed type so what you are seeing is the two-compliment representation, casting to (unsigned char*) will fix that (Rowland just beat me).

On a side note you may want to change

for (i=0; i<4; i++) {
//...
}

to

for (i=0; i<sizeof(x); i++) {
//...
}
Kevin Loney
char isn't signed always. signed char is signed. the sign-ness of char depends on the compiler. in any case, char, signed char and unsigned char are three different types.
Johannes Schaub - litb
"char" is obviously signed in this context though, because sign extension is occurring when the parameter is passed to printf on the stack.
dreamlax
dreamlax, indeed his answer is fine :) just wanted to tell them that on another system, the output could very well be otherwise (non-negative), because char could aswell be unsigned. it depends on the compiler.
Johannes Schaub - litb
A: 

You may want to convert to a unsigned char array.

Otávio Décio
+4  A: 

Treating your struct as if it were a char array is undefined behavior. To send it over the network, use proper serialization instead. It's a pain in C++ and even more so in C, but it's the only way your app will work independently of the machines reading and writing.

http://en.wikipedia.org/wiki/Serialization#C

+3  A: 

The x format specifier by itself says that the argument is an int, and since the number is negative, printf requires eight characters to show all four non-zero bytes of the int-sized value. The 0 modifier tells to pad the output with zeros, and the 2 modifier says that the minimum output should be two characters long. As far as I can tell, printf doesn't provide a way to specify a maximum width, except for strings.

Now then, you're only passing a char, so bare x tells the function to use the full int that got passed instead — due to default argument promotion for "..." parameters. Try the hh modifier to tell the function to treat the argument as just a char instead:

printf("%02hhx", b[i]);
Rob Kennedy
+8  A: 

What you see is the sign preserving conversion from char to int. The behavior results from the fact that on your system, char is signed (Note: char is not signed on all systems). That will lead to negative values if a bit-pattern yields to a negative value for a char. Promoting such a char to an int will preserve the sign and the int will be negative too. Note that even if you don't put a (int) explicitly, the compiler will automatically promote the character to an int when passing to printf. The solution is to convert your value to unsigned char first:

for (i=0; i<4; i++)
   printf("%02x ", (unsigned char)b[i]);

Alternatively, you can use unsigned char* from the start on:

unsigned char *b = (unsigned char *)&a;

And then you don't need any cast at the time you print it with printf.

Johannes Schaub - litb
+1  A: 

Converting your structure to characters or bytes the way you're doing it, is going to lead to issues when you do try to make it network neutral. Why not address that problem now? There are a variety of different techniques you can use, all of which are likely to be more "portable" than what you're trying to do. For instance:

  • Sending numeric data across the network in a machine-neutral fashion has long been dealt with, in the POSIX/Unix world, via the functions htonl, htons, ntohl and ntohs. See, for example, the byteorder(3) manual page on a FreeBSD or Linux system.
  • Converting data to and from a completely neutral representation like JSON is also perfectly acceptable. The amount of time your programs spend converting the data between JSON and native forms is likely to pale in comparison to the network transmission latencies.
Brian Clapper
A: 

Unless you have very convincing measurements showing that every octet is precious, don't do this. Use a readable ASCII protocol like SMTP, NNTP, or one of the many other fine Internet protocols codified by the IETF.

If you really must have a binary format, it's still not safe just to shove out the bytes in a struct, because the byte order, basic sizes, or alignment constraints may differ from host to host. You must design your wire protcol to use well-defined sizes and to use a well defined byte order. For your implementation, either use macros like ntohl(3) or use shifting and masking to put bytes into your stream. Whatever you do, make sure your code produces the same results on both big-endian and little-endian hosts.

Norman Ramsey
A: 

The signedness of char array is not the root of the problem! (It is -a- problem, but not the only problem.)

Alignment! That's the key word here. That's why you should NEVER try to treat structs like raw memory. Compliers (and various optimization flags), operating systems, and phases of the moon all do strange and exciting things to the actual location in memory of "adjacent" fields in a structure. For example, if you have a struct with a char followed by an int, the whole struct will be EIGHT bytes in memory -- the char, 3 blank, useless bytes, and then 4 bytes for the int. The machine likes to do things like this so structs can fit cleanly on pages of memory, and such like.

Take an introductory course to machine architecture at your local college. Meanwhile, serialize properly. Never treat structs like char arrays.

John Smith
A: 

When you go to send it, just use:

(char*)&CustomPacket

to convert. Works for me.

InfinateOne