views:

549

answers:

5

Hello StackOverflow,

I want to transmit data over the network, but I don't want to use any foreign libraries (Standard C/C++ is ok).

for example:

unsigned int x = 123;
char y[3] = {'h', 'i', '\0'};
float z = 1.23f;

I want this in an

char xyz[11];

array.

Note: To transmit it over network, I need Network Byte order for the unsigned int (htonl function), then I need to somehow serialize the float to be in IEEE 754 form (theres many functions on the internet), and I know it.

How do I get them into the the xyz-Array, nicely lined up end to end, so I can use this as a buffer for my socket + send() function? Obviously I have reverse functions (ntohl, and a reverse IEEE 754) to get them out but I need a technique there too, preferably the same...

It would be something like this:

xyz in binary: 
00000000 0000000 00000000 01111011 | 01101000 | 01101001 | 00000000 | 00111111 10011101 01110000 10100100
- big endian repr. of u. int 123 - | - 'h'  - | - 'i'  - | - '\0' - | -   IEEE 754 repr of float 1.23   -

How can I accomplish this without external libraries and minimal use of standard library functions? This isn't so much for my program as for me to learn from.

A: 

What exactly is your goal? And what exactly are the means you're willing to use?

If you just want to get the job done with one particular compiler on one particular computer, then the fastest and easiest, but also dirtiest, solution is, to use a union. You define a struct that has your items as members and merge that with the character array. You need to tell the compiler to pack the members really tightly, something along the lines of #pragma pack(1), and your problem is solved. You just store the three values in the members, and then look at it as a character array.

If the machine is little endian, and you need big endian ints / floats, you just swap the relevant characters.

But there are at least another dozen solutions that come to mind if you have other goals, like portability, non-standard byte order, sizeof(int) !=4, float not stored in IEEE format internally, etc.

Carsten Kuckuk
I want to learn to serialize primitive C/C++ Datatypes (in a way so I can add knowledge of serializing structs later) in C/C++. The means are any C/C++ functions accepted by GCC, and Standard C/C++ Library functions. Thank you, I will look into unions.
wsd
+1  A: 

Something like the code below would do it. Watch out for problems where sizeof(unsigned int) is different on different systems, those will get you. For things like this you're better off using types with well-defined sizes, like int32_t. Anyway...

unsigned int x = 123;
char y[3] = {'h', 'i', '\0'};
float z = 1.23f;

// The buffer we will be writing bytes into
unsigned char outBuf[sizeof(x)+sizeof(y)+sizeof(z)];

// A pointer we will advance whenever we write data
unsigned char * p = outBuf;

// Serialize "x" into outBuf
unsigned int32_t neX = htonl(x);
memcpy(p, &neX, sizeof(neX));
p += sizeof(neX);

// Serialize "y" into outBuf
memcpy(p, y, sizeof(y));
p += sizeof(y);

// Serialize "z" into outBuf
memcpy(p, y, sizeof(y));
int32_t neZ = htonl(*(reinterpret_cast<int32_t *>(&z)));
memcpy(p, &neZ, sizeof(neZ));
p += sizeof(neZ);

int resultCode = send(mySocket, outBuf, p-neZ, 0);
[...]

... and of course the receiving code would do something similar, except in reverse.

Jeremy Friesner
+1  A: 

It's missing floating points, so you'd have to figure that out for your particular machine.. But I've used the following.

/*Just for "consistency" */
#define UNPACK8(p)      ((p)[0])
#define PACK8(p,v)      (p)[0]=(v)

#define UNPACK16LE(p)   ((p)[0]|((p)[1]<<8))
#define UNPACK32LE(p)   ((p)[0]|((p)[1]<<8UL)|((p)[2]<<16UL)|((p)[3]<<24UL))
#define UNPACK64LE(p)   (((p)[0]|((p)[1]<<8ULL)|((p)[2]<<16ULL)|((p)[3]<<24ULL)) |\
                                (((p)[4]|((p)[5]<<8ULL)|((p)[6]<<16)|((p)[7]<<24ULL)) << 32ULL))

#define PACK16LE(p,v)   (p)[0]=(v);(p)[1]=(v)>>8
#define PACK32LE(p,v)   (p)[0]=(v);(p)[1]=(v)>>8UL;(p)[2]=(v)>>16UL;(p)[3]=(v)>>24UL
#define PACK64LE(p,v)   (p)[0]=(v);(p)[1]=(v)>>8ULL;(p)[2]=(v)>>16ULL;(p)[3]=(v)>>24ULL;\
                        (p)[4]=(v)>>32ULL;(p)[5]=(v)>>40ULL;(p)[6]=(v)>>48ULL;(p)[7]=(v)>>56ULL

#define UNPACK16BE(p)   ((p)[1]|((p)[0]<<8))
#define UNPACK32BE(p)   ((p)[3]|((p)[2]<<8UL)|((p)[1]<<16UL)|((p)[0]<<24UL))
#define UNPACK64BE(p)   (((p)[7]|((p)[6]<<8ULL)|((p)[5]<<16ULL)|((p)[4]<<24ULL)) |\
                                (((p)[3]|((p)[2]<<8ULL)|((p)[1]<<16)|((p)[0]<<24ULL)) << 32ULL))

#define PACK16BE(p,v)   (p)[1]=(v);(p)[0]=(v)>>8
#define PACK32BE(p,v)   (p)[3]=(v);(p)[2]=(v)>>8UL;(p)[1]=(v)>>16UL;(p)[0]=(v)>>24UL
#define PACK64BE(p,v)   (p)[7]=(v);(p)[6]=(v)>>8ULL;(p)[5]=(v)>>16ULL;(p)[4]=(v)>>24ULL;\
                        (p)[3]=(v)>>32ULL;(p)[2]=(v)>>40ULL;(p)[1]=(v)>>48ULL;(p)[0]=(v)>>56ULL


void
vapickle(uint8_t * buf, const char *fmt, va_list args)
{
        const char *fp;
        uint8_t u8;
        uint16_t u16;
        uint32_t u32;
        uint64_t u64;

        for (fp = fmt; *fp; fp++) {
                switch (*fp) {
                case 'b':
                        u8 = va_arg(args, unsigned int);
                        PACK8(buf,u8);
                        ++buf;
                        break;
                case 'S':
                        u16= va_arg(args,unsigned int);
                        PACK16BE(buf,u16);
                        buf+=2;
                        break;
                case 'I':
                        u32 = va_arg(args, unsigned int);
                        PACK32BE(buf,u32);
                        buf+=4;
                        break;
                case 'L':
                        u64 = va_arg(args, unsigned long long);
                        PACK32BE(buf,u32);
                        buf+=8;
                        break;
                case 's':
                        u16= va_arg(args,unsigned int);
                        PACK16LE(buf,u16);
                        buf+=2;
                        break;
                case 'i':
                        u32 = va_arg(args, unsigned int);
                        PACK32LE(buf,u32);
                        buf+=4;
                        break;
                case 'l':
                        u64 = va_arg(args, uint64_t);
                        PACK32LE(buf,u64);
                        buf+=8;
                        break;
                case ' ':
                        ++buf;
                        break;

                }
        }

}

void
pickle(uint8_t *buf, const char *fmt, ...)
{
        va_list args;

        va_start(args, fmt);
        vapickle(buf,  fmt, args);
        va_end(args);
}

void
vadepickle(const uint8_t*buf, const char *fmt, va_list args)
{
        const char *fp;
        uint8_t *u8;
        uint16_t *u16;
        uint32_t *u32;
        uint64_t *u64;

        for (fp = fmt; *fp; fp++) {
                switch (*fp) {
                case 'b':
                        u8 = va_arg(args, uint8_t*);
                        *u8 = UNPACK8(buf);
                        ++buf;
                        break;
                case 'S':
                        u16= va_arg(args, uint16_t*);
                        *u16 = UNPACK16BE(buf);
                        buf+=2;
                        break;
                case 'I':
                        u32 = va_arg(args, uint32_t*);
                        *u32 = UNPACK32BE(buf);
                        buf+=4;
                        break;
                case 'L':
                        u64 = va_arg(args, uint64_t*);
                        *u64 = UNPACK32BE(buf);
                        buf+=8;
                        break;
                case 's':
                        u16 = va_arg(args,uint16_t*);
                        *u16 = UNPACK16LE(buf);
                        buf+=2;
                        break;
                case 'i':
                        u32 = va_arg(args, uint32_t*);
                        *u32 = UNPACK32LE(buf);
                        buf+=4;
                        break;
                case 'l':
                        u64 = va_arg(args, uint64_t*);
                        *u64 = UNPACK32LE(buf);
                        buf+=8;
                        break;
                case ' ':
                        ++buf;
                        break;
                }
        }

}

void
depickle(const uint8_t *buf, const char *fmt, ...)
{
        va_list args;

        va_start(args, fmt);
        vadepickle(buf, fmt, args);
        va_end(args);

}

int main(void)
{
        uint32_t i = 0x123456;
        uint16_t j = 0xabab;
        uint16_t k = 0xbbab;
        uint8_t buf[8]; /* keep this big enough according to the format string */ 

        pickle(buf,"sis",j,i,k); /*s=uint16, little endian. i = uint32, litte endian, S = uint16 big endian etc.*/
        write(1,buf,sizeof buf);
        read(0,buf,sizeof buf);
        depickle(buf,"SiS",&j,&i,&k);

        return 1;

}
nos
Is pickle a widely used term (I though serializing, marshalling) or do you just like little cucumbers fermented in vinegar?
wsd
Python has a 'pickle' module that does something similar to this...
Jeremy Friesner
+1  A: 

This discussion seems relevant to your question, but it uses boost serialization API

Moshe Kravchik
I think Boost will teach you a lot (you can look up the implementation). And also will give you a ready solution to a numerous issues you could not think yourself.
Moshe Kravchik
I am browsing around the Boost Serialisation API as I write this (in another Tab xD), but It seems Overkill for what I want to do. Trying to programatically distill it...
wsd
+7  A: 

Ah, you want to serialize primitive data types! In principle, there are two approaches: The first one is, that you just grab the internal, in-memory binary representation of the data you want to serialize, reinterpret it as a character, and use that as you representation:

So if you have a:

double d;

you take the address of that, reinterpret that pointer as a pointer to character, and then use these characters:

double *pd=&d;
char *pc = reinterpret_cast<char*>(pd); 
for(size_t i=0; i<sizeof(double); i++) 
{
   char ch = *pc;   
   DoSomethingWith(ch);   
   pc++;
}

This works with all primitive data types. The main problem here is, that the binray representation is implementation dependent (mainly CPU dependent). (And you will run into subtle bugs when you try doing this with IEEE NANs...).

All in all, this approach is not portable at all, as you have no control at all over the representation of your data.

The second approach is, to use a higher-level representation, that you yourself have under control. If performance is not an issue, you could use std::strstream and the >> and << operators to stream primitive C type variables into std::strings. This is slow but easy to read and debug, and very portable on top of it.

Carsten Kuckuk
+1 for highlighting issues, and adding undefined padding. And I'll bite the bate :), what are the subtle bugs with IEEE NaNs in this scenario? Thanks..
rama-jka toti
There are signalling NaNs, and non-signalling NaNs. When you work with these representation as char arrays, you can read and write them easily. But when you access them as floats, just the act of reading them can cause the CPU to signal. So if you're not careful, you can end up with a program that deserializes without a problem, but once you touch the float, you end up in trouble. And as this thread is about learning, I thought I might point out this area.
Carsten Kuckuk
+1, and I haven't seen it mentioned here in this context.. although vendors tend to avoid marshalling, as well as serialising floats of any kind, finally :)
rama-jka toti
performance is always an issue :) Still this is the clearest answer on this post, thank you!
wsd