views:

248

answers:

4

I want to send a character array over a tcp socket in unix.

My first idea was to use an ordinary char array for the struct that will be sent over the socket:

typedef struct __attribute__((packed))
{
   uint8_t type;
   uint8_t id_index;
   char char_value[STR_MSG_MAX];
} a_msg;

Simply because a C char is always 8 bit long. However, after some googling I found out that even if a char is always 8 bit long the underlying representation could actually be a 32 bit integer. So my impression is that char is maybe not the best way of representing a string in a message that will be sent over a socket from FreeBSd to Linux (or input some other unixes if you want to =) ...).

stdint.h is present on all modern unixes to day (I hope) and my thoughts is that maybe a array of uint8_t or a int8_t could do the trick.

typedef struct __attribute__((packed))
{
   uint8_t type;
   uint8_t id_index;
   uint8_t char_value[STR_MSG_MAX];
} a_msg;

or

typedef struct __attribute__((packed))
{
   uint8_t type;
   uint8_t id_index;
   int8_t char_value[STR_MSG_MAX];
} a_msg;

However, uint8_t is a unsigned char and int8_t is a signed char. A standard C char is neither of that because the implementation is undefined as I understand it.

My questions is: What is the best way of representing a character array (string) in C that will be sent over tcp/ip in a *nix (Linux, FreeBSD etc.) platform independent way.

A: 

You can not say what are you sending whit c. This info is not transferred.

All you have to do is:

char* buffer = (char*)(&a_msg);

An the safest way is to use unsigned characters if posssible.

ralu
Well no. However this is a bout representing data so if you know on the other side what is received it is another story then you can cast your byte stream to something else.
Codeape
I just realised, that question was about char / uchar representation on multiple platforms.
ralu
+1  A: 

I personally would go for something like:

typedef struct __attribute__((packed))
{
   uint8_t type;
   uint8_t id_index;
   uint8_t padding[2]; //this is to align to 32bit boundary
   uint8_t char_value[STR_MSG_MAX];
} a_msg;

But it will work without the padding.

In C a char is always 8 bits long. So an array of char is always an array of bytes. However the character literal 'x' is 32 bits. This can be verified using the sizeof operator on a character literal. You will also see that all the functions that return a single character like getch return an int. The reason is that we need a way of indicating an End of File EOF. This can only be done using a value outside of the 8 bit range.

doron
In common modern practice, a `char` is 8 bits long. But this is not required by the standard -- C will work on a 6 bit or 16 bit processor.
bstpierre
@bstpierre: The *minimum* allowed for a `char` is 8 bits. So a 6 bit processor would have to represent a `char` with a 12 bit word, and C programs wouldn't be able to address individual 6 bit cells.
caf
@caf - You're right, I typo'd a character above. Meant to say 36 or 16. Thanks for the correction.
bstpierre
There is no such thing as character literals in C. What you are referring to are "integer character constants" and C99 states:`An integer character constant has type int`. So no need to do padding, this is useless here. `char` and thus `uint8_t` have always `sizeof` equal to `1` by definition and this is always the smallest unit of data that you may access with a pointer, and this will always be the alignement you get for `char` compatible types. Whether or not this corresponds to 8 bit can be checked with `CHAR_BITS` and is guaranteed as caf's answer indicates as soon as `uint8_t` exists.
Jens Gustedt
@Jens Gustedt, So it does not matter if I use char or uint8_t?
Codeape
Even if the padding is not 100% necessary, it costs little and may simplify basic copy operations on the array.
doron
@Codeape: It does not matter for the padding and stuff like that. It does matter for your two other fields, where it is really good attitude (as you did) to have them unsigned integer types. For the data itself, it is common practice to view them as "uninterpreted" bytes, thus `char` is a good choice, here.
Jens Gustedt
Thanks. I will go for a array of char and use padding =)
Codeape
+4  A: 

Although char may be more than 8 bits wide, it must always be the (equal) narrowest type. (Since, among other reasons, sizeof(char) is defined to be 1).

So if the platform provides int8_t, then char must be exactly 8 bits too (since char is separately restricted to be at least 8 bits). This implies that you might as well use char.

caf
What happens if I send from a platform where the char is represented by for example a 32 bit int and the receiving end represent a character by a 8 bit data type? If I send a 40 character array I will send 40 x 32 bit but what will happen on the receiving side that represent it with a structure of 40 x 8 bit?
Codeape
@Codeape: Sure, that's a problem - but it's not one that can be solved by using `int8_t`, since a platform with 32 bit `char` can't provide that type. Such platforms do not usually have network libraries anyway.
caf
I will use an array of char.
Codeape
A: 

I think the idea of packing the struct is the way to go. I would write some test code to make sure it is working. Do a sizeof(a_msg) to see what size it is. You should be able to tell if the packing worked without having to send messages over the socket.

zooropa
Good. I am packing the struct.
Codeape