ansaurus

Question

Serialize Strings, ints and floats to character arrays for networking WITHOUT LIBRARIES

Answer 1

A:

What exactly is your goal? And what exactly are the means you're willing to use?

If you just want to get the job done with one particular compiler on one particular computer, then the fastest and easiest, but also dirtiest, solution is, to use a union. You define a struct that has your items as members and merge that with the character array. You need to tell the compiler to pack the members really tightly, something along the lines of #pragma pack(1), and your problem is solved. You just store the three values in the members, and then look at it as a character array.

If the machine is little endian, and you need big endian ints / floats, you just swap the relevant characters.

But there are at least another dozen solutions that come to mind if you have other goals, like portability, non-standard byte order, sizeof(int) !=4, float not stored in IEEE format internally, etc.

Carsten Kuckuk 2009-11-09 20:08:51

I want to learn to serialize primitive C/C++ Datatypes (in a way so I can add knowledge of serializing structs later) in C/C++. The means are any C/C++ functions accepted by GCC, and Standard C/C++ Library functions. Thank you, I will look into unions.

wsd 2009-11-09 20:12:16

Answer 2

+1 A:

Something like the code below would do it. Watch out for problems where sizeof(unsigned int) is different on different systems, those will get you. For things like this you're better off using types with well-defined sizes, like int32_t. Anyway...

unsigned int x = 123;
char y[3] = {'h', 'i', '\0'};
float z = 1.23f;

// The buffer we will be writing bytes into
unsigned char outBuf[sizeof(x)+sizeof(y)+sizeof(z)];

// A pointer we will advance whenever we write data
unsigned char * p = outBuf;

// Serialize "x" into outBuf
unsigned int32_t neX = htonl(x);
memcpy(p, &neX, sizeof(neX));
p += sizeof(neX);

// Serialize "y" into outBuf
memcpy(p, y, sizeof(y));
p += sizeof(y);

// Serialize "z" into outBuf
memcpy(p, y, sizeof(y));
int32_t neZ = htonl(*(reinterpret_cast<int32_t *>(&z)));
memcpy(p, &neZ, sizeof(neZ));
p += sizeof(neZ);

int resultCode = send(mySocket, outBuf, p-neZ, 0);
[...]

... and of course the receiving code would do something similar, except in reverse.

Jeremy Friesner 2009-11-09 20:12:03

Answer 3

+1 A:

It's missing floating points, so you'd have to figure that out for your particular machine.. But I've used the following.

/*Just for "consistency" */
#define UNPACK8(p)      ((p)[0])
#define PACK8(p,v)      (p)[0]=(v)

#define UNPACK16LE(p)   ((p)[0]|((p)[1]<<8))
#define UNPACK32LE(p)   ((p)[0]|((p)[1]<<8UL)|((p)[2]<<16UL)|((p)[3]<<24UL))
#define UNPACK64LE(p)   (((p)[0]|((p)[1]<<8ULL)|((p)[2]<<16ULL)|((p)[3]<<24ULL)) |\
                                (((p)[4]|((p)[5]<<8ULL)|((p)[6]<<16)|((p)[7]<<24ULL)) << 32ULL))

#define PACK16LE(p,v)   (p)[0]=(v);(p)[1]=(v)>>8
#define PACK32LE(p,v)   (p)[0]=(v);(p)[1]=(v)>>8UL;(p)[2]=(v)>>16UL;(p)[3]=(v)>>24UL
#define PACK64LE(p,v)   (p)[0]=(v);(p)[1]=(v)>>8ULL;(p)[2]=(v)>>16ULL;(p)[3]=(v)>>24ULL;\
                        (p)[4]=(v)>>32ULL;(p)[5]=(v)>>40ULL;(p)[6]=(v)>>48ULL;(p)[7]=(v)>>56ULL

#define UNPACK16BE(p)   ((p)[1]|((p)[0]<<8))
#define UNPACK32BE(p)   ((p)[3]|((p)[2]<<8UL)|((p)[1]<<16UL)|((p)[0]<<24UL))
#define UNPACK64BE(p)   (((p)[7]|((p)[6]<<8ULL)|((p)[5]<<16ULL)|((p)[4]<<24ULL)) |\
                                (((p)[3]|((p)[2]<<8ULL)|((p)[1]<<16)|((p)[0]<<24ULL)) << 32ULL))

#define PACK16BE(p,v)   (p)[1]=(v);(p)[0]=(v)>>8
#define PACK32BE(p,v)   (p)[3]=(v);(p)[2]=(v)>>8UL;(p)[1]=(v)>>16UL;(p)[0]=(v)>>24UL
#define PACK64BE(p,v)   (p)[7]=(v);(p)[6]=(v)>>8ULL;(p)[5]=(v)>>16ULL;(p)[4]=(v)>>24ULL;\
                        (p)[3]=(v)>>32ULL;(p)[2]=(v)>>40ULL;(p)[1]=(v)>>48ULL;(p)[0]=(v)>>56ULL


void
vapickle(uint8_t * buf, const char *fmt, va_list args)
{
        const char *fp;
        uint8_t u8;
        uint16_t u16;
        uint32_t u32;
        uint64_t u64;

        for (fp = fmt; *fp; fp++) {
                switch (*fp) {
                case 'b':
                        u8 = va_arg(args, unsigned int);
                        PACK8(buf,u8);
                        ++buf;
                        break;
                case 'S':
                        u16= va_arg(args,unsigned int);
                        PACK16BE(buf,u16);
                        buf+=2;
                        break;
                case 'I':
                        u32 = va_arg(args, unsigned int);
                        PACK32BE(buf,u32);
                        buf+=4;
                        break;
                case 'L':
                        u64 = va_arg(args, unsigned long long);
                        PACK32BE(buf,u32);
                        buf+=8;
                        break;
                case 's':
                        u16= va_arg(args,unsigned int);
                        PACK16LE(buf,u16);
                        buf+=2;
                        break;
                case 'i':
                        u32 = va_arg(args, unsigned int);
                        PACK32LE(buf,u32);
                        buf+=4;
                        break;
                case 'l':
                        u64 = va_arg(args, uint64_t);
                        PACK32LE(buf,u64);
                        buf+=8;
                        break;
                case ' ':
                        ++buf;
                        break;

                }
        }

}

void
pickle(uint8_t *buf, const char *fmt, ...)
{
        va_list args;

        va_start(args, fmt);
        vapickle(buf,  fmt, args);
        va_end(args);
}

void
vadepickle(const uint8_t*buf, const char *fmt, va_list args)
{
        const char *fp;
        uint8_t *u8;
        uint16_t *u16;
        uint32_t *u32;
        uint64_t *u64;

        for (fp = fmt; *fp; fp++) {
                switch (*fp) {
                case 'b':
                        u8 = va_arg(args, uint8_t*);
                        *u8 = UNPACK8(buf);
                        ++buf;
                        break;
                case 'S':
                        u16= va_arg(args, uint16_t*);
                        *u16 = UNPACK16BE(buf);
                        buf+=2;
                        break;
                case 'I':
                        u32 = va_arg(args, uint32_t*);
                        *u32 = UNPACK32BE(buf);
                        buf+=4;
                        break;
                case 'L':
                        u64 = va_arg(args, uint64_t*);
                        *u64 = UNPACK32BE(buf);
                        buf+=8;
                        break;
                case 's':
                        u16 = va_arg(args,uint16_t*);
                        *u16 = UNPACK16LE(buf);
                        buf+=2;
                        break;
                case 'i':
                        u32 = va_arg(args, uint32_t*);
                        *u32 = UNPACK32LE(buf);
                        buf+=4;
                        break;
                case 'l':
                        u64 = va_arg(args, uint64_t*);
                        *u64 = UNPACK32LE(buf);
                        buf+=8;
                        break;
                case ' ':
                        ++buf;
                        break;
                }
        }

}

void
depickle(const uint8_t *buf, const char *fmt, ...)
{
        va_list args;

        va_start(args, fmt);
        vadepickle(buf, fmt, args);
        va_end(args);

}

int main(void)
{
        uint32_t i = 0x123456;
        uint16_t j = 0xabab;
        uint16_t k = 0xbbab;
        uint8_t buf[8]; /* keep this big enough according to the format string */ 

        pickle(buf,"sis",j,i,k); /*s=uint16, little endian. i = uint32, litte endian, S = uint16 big endian etc.*/
        write(1,buf,sizeof buf);
        read(0,buf,sizeof buf);
        depickle(buf,"SiS",&j,&i,&k);

        return 1;

}

nos 2009-11-09 20:12:51

Is pickle a widely used term (I though serializing, marshalling) or do you just like little cucumbers fermented in vinegar?

wsd 2009-11-11 14:44:21

Python has a 'pickle' module that does something similar to this...

Jeremy Friesner 2009-11-12 18:07:41

Answer 4

+1 A:

This discussion seems relevant to your question, but it uses boost serialization API

Moshe Kravchik 2009-11-09 20:16:31

I think Boost will teach you a lot (you can look up the implementation). And also will give you a ready solution to a numerous issues you could not think yourself.

Moshe Kravchik 2009-11-09 20:19:17

I am browsing around the Boost Serialisation API as I write this (in another Tab xD), but It seems Overkill for what I want to do. Trying to programatically distill it...

wsd 2009-11-09 20:27:39

Answer 5

+7 A:

Ah, you want to serialize primitive data types! In principle, there are two approaches: The first one is, that you just grab the internal, in-memory binary representation of the data you want to serialize, reinterpret it as a character, and use that as you representation:

So if you have a:

double d;

you take the address of that, reinterpret that pointer as a pointer to character, and then use these characters:

double *pd=&d;
char *pc = reinterpret_cast<char*>(pd); 
for(size_t i=0; i<sizeof(double); i++) 
{
   char ch = *pc;   
   DoSomethingWith(ch);   
   pc++;
}

This works with all primitive data types. The main problem here is, that the binray representation is implementation dependent (mainly CPU dependent). (And you will run into subtle bugs when you try doing this with IEEE NANs...).

All in all, this approach is not portable at all, as you have no control at all over the representation of your data.

The second approach is, to use a higher-level representation, that you yourself have under control. If performance is not an issue, you could use std::strstream and the >> and << operators to stream primitive C type variables into std::strings. This is slow but easy to read and debug, and very portable on top of it.

Carsten Kuckuk 2009-11-09 20:23:59

+1 for highlighting issues, and adding undefined padding. And I'll bite the bate :), what are the subtle bugs with IEEE NaNs in this scenario? Thanks..

rama-jka toti 2009-11-09 20:41:15

There are signalling NaNs, and non-signalling NaNs. When you work with these representation as char arrays, you can read and write them easily. But when you access them as floats, just the act of reading them can cause the CPU to signal. So if you're not careful, you can end up with a program that deserializes without a problem, but once you touch the float, you end up in trouble. And as this thread is about learning, I thought I might point out this area.

Carsten Kuckuk 2009-11-09 20:48:24

+1, and I haven't seen it mentioned here in this context.. although vendors tend to avoid marshalling, as well as serialising floats of any kind, finally :)

rama-jka toti 2009-11-09 21:25:17

performance is always an issue :) Still this is the clearest answer on this post, thank you!

wsd 2009-11-11 14:40:23

ansaurus

tags:

views:

answers:

Serialize Strings, ints and floats to character arrays for networking WITHOUT LIBRARIES

related questions