views:

1664

answers:

17

At a previous employer, we were writing binary messages that had to go "over the wire" to other computers. Each message had a standard header something like:

class Header
{
    int type;
    int payloadLength;
};

All of the data was contiguous (header, immediately followed by data). We wanted to get to the payload given that we had a pointer to a header. Traditionally, you might say something like:

char* Header::GetPayload()
{
    return ((char*) &payloadLength) + sizeof(payloadLength);
}

or even:

char* Header::GetPayload()
{
    return ((char*) this) + sizeof(Header);
}

That seemed kind of verbose, so I came up with:

char* Header::GetPayload()
{
    return (char*) &this[1];
}

It seems rather disturbing at first, possibly too odd to use -- but very compact. There was a lot of debate on whether it was brilliant or an abomination.

So which is it - Crime against coding, or nice solution? Have you ever had a similar trade-off?

-Update:

We did try the zero sized array, but at the time, compilers gave warnings. We eventually went to the inhertited technique: Message derives from Header. It works great in practice, but in priciple you are saying a message IsA Header - which seems a little awkward.

+13  A: 

Personally I think that if there's a crime, it's asking the header for the payload.

But as long as you're going to do it that way, 'this+1' is as good a way as any.

Justification: '&this[1]' is a general-purpose piece of code that doesn't require you to go digging through class-definitions to fully comprehend, and doesn't require fixing when someone changes the name or contents of the class.

BTW, the first example is the true crime against humanity. Add a member to the end of the class and it'll fail. Move the members around the class and it'll fail. If the compiler pads the class, it'll fail.

Also, if you're going to assume that the compiler's layout of classes/structs matches your packet layout, then you should understand how the compiler in question works. Eg. on MSVC you'll probably want to know about #pragma pack.

PS: It's a little scary how many people consider "this+1" or "&this[1]" hard to read or understand.

Menkboy
+5  A: 

My vote is Coding Horror. Don't get me wrong, it's clever - but you're saving yourself one entire addition operation at the cost of making the code much more difficult to understand and read. I don't see the tradeoff as worth it.

Tom Ritter
It's not even saving an addition - the compiler must still generate it to get the address of the second element. It's just being hidden by the syntax.
Mark Ransom
+42  A: 

I'd go for crime against coding.

Both methods will generate the exact same object code. The first makes it's intention clear. The second is very confusing, with the only advantage that it saves a couple keystrokes. (Just learn to freakin' type).

Also, note that NEITHER method is guaranteed to work. The sizeof() an object includes padding for word alignment, so that if the header was:

class Header
{
    int type;
    int payloadLength;
    char  status;
};

Both methods you describe would have the payload starting at Header+12, when most likely it actually starts at Header+9.

James Curran
+1 for "Jsust learn to freakin' type"
Tanktalus
For tips on how to type faster, see http://stackoverflow.com/questions/94101/how-to-type-faster
James Schek
Menkboy
Ates Goral
Jeff Yates
Menkboy
It is referencing one "Header" beyond Header and therefore pointing at the beginning of the payload. I think my previous comment is poorly worded. I can see situations where some may change Header and think they have to change this too, when they don't. Yes, they should know better, but some don't.
Jeff Yates
What I'm trying to say is, if you have to explain it to one person, there's more that will need it explaining and whether you believe they should know or not, you have to code for what will happen not what you'd like to happen. That's the nature of professional software development.
Jeff Yates
@ffpf: I can see that. But OTOH, the people you explain it to will have learnt something, and probably won't have to ask again.
Menkboy
@Ates Goral: I like it! :D
KTC
Tiny code is no replacement for readable code. Sure, it might be fun to try to make something as small as possible, but it harms the ability to read and understand the code.
TM
Should never use compiler structures over the wire. Endianness issues, compiler padding issues between projects, CPU word sizes etc. Learn how to use a real solution. Google StreamBuffers is (for example) much clearer, and language independant.
Tim Williscroft
A: 

Perhaps you should have used a verbose method, but replaced it with a #define macro? This way you can use your shorthand when typing, but anyone needing to debug the code can follow along without issue.

Nighthawk
+1  A: 

They're basically the same thing as far as I'm concerned. Both are forms of byte juggling, which is always risky, but not impossible to get right. The first form is a bit more accepted and recognizable. I would personally write :

char* Header::GetPayload()
{
    return ((char*) this) + sizeof(*this);
}
QBziZ
+5  A: 

I think this is flawed from the start on if the header needs to "return" data which is not included in it.

As you already put yourself on these hackish grounds, I really love what you came up with.

But note that this is not a beauty contest. You should find an entire different solution. For alle the three versions of GetPayload() you presented, I would not understand what the hell is going on there without your further explanation.

ypnos
+3  A: 

If it works -- consistently -- then it's an elegant solution.

It will normally work in memory because the compiler will deal with alignment issues and you can assume that the Payload follows the header in correctly-aligned memory space.

I could see this falling apart when the Header/Payload objects are streamed "over the wire" because the streaming mechanism you use will probably not care about aligning objects on any particular boundary. Therefore, Payload may directly follow the Header with no padding to force it to a particular alignment.

To coin a phrase, elegant is as elegant does. So, it's elegant as long as you are careful how you stream it.

Kluge
+14  A: 

You're depending on the compiler to layout your classes in a particular way. I would have defined the message as a struct (with me defining layout) and had a class that encapsulates the message and provides the interface to it. Clear code = good code. "Cute" code = bad (hard to maintain) code.

struct Header
{
    int type;
    int payloadlength;
}
struct MessageBuffer
{
   struct Header header;
   char[MAXSIZE] payload;
}

class Message
{
  private:
   MessageBuffer m;

  public:
   Message( MessageBuffer buf ) { m = buf; }

   struct Header GetHeader( )
   {
      return m.header;
   }

   char* GetPayLoad( )
   {
      return &m.payload;
   }
}

It's been awhile since I've written any C++, so please excuse any issues with syntax. Just trying to convey the general idea.

tvanfosson
I implemented a peer to peer filesharing system that used essentially exactly this. It worked really well. I had a factory create different subclasses of Message based on the type so that I didn't have to deal with raw payload though. Not too proud of the RTTI required, but it was isolated.
rmeador
This is also the pattern that came to mind for me. I'm surprised at the number of people who consider the techniques used in the question to be 'elegant'. Maybe clever hacks, but the kind of thing that I thought best practices have been trying to eliminate over the past 15 years (or more).
Michael Burr
+1  A: 

Don't forget that VC++ may impose a padding on the sizeof() value on the class. Since the provided example is expected to be 8 bytes, it is automatically DWORD aligned, so should be ok. Check #pragma pack.

Though, I agree, the provided examples are some degree of Coding Horror. Many Win32 data structures include a pointer placeholder in the header structure when variable length data follows. This is probably the easiest way to reference this data once it's loaded into memory. The MAPI SRowSet structure is one example of this approach.

spoulson
+2  A: 

First, there's an awful lot of space between "crime against coding" and "nice solution," but I'd say this is closer to the former.

Is the Header its Payload's keeper?

That's the fundamental problem here -- both the header and the payload should be managed by another object that holds the entire message, and that is the proper place to ask for the payload. And it would be able to do so without pointer arithmetic or indexing.

Given that, I would favor the second solution, since it is clearer what is going on.

But that we're in this situation to begin with seems to indicate that the culture of your team values cleverness over clarity, so I guess all bets are off.

If you really want to be cute, you could generalize.

template<typename T. typename RetType>
RetType JustPast(const T* pHeader)
{
   return reinterpret_cast<RetType>(pHeader + sizeof(T));
}
JohnMcG
+4  A: 

Have you considered the 'empty array member' trick? I remember seeing it often, and even using it once or twice, but I can't seem to find any really good references (except, maybe, the one referenced below).

The trick is to declare your struct as

struct bla {
    int i;
    int j;
    char data[0];
}

Then, the 'data' member simply points to whatever is behind the headers. I am not sure how portable this is; I've seen it with '1' as the array size as well.

(using the URL below as a referece, using the '[1]' syntax, seemed not to work because it is too long. Here is the link:)

http://developer.apple.com/documentation/DeveloperTools/gcc-4.0.1/gcc/Zero-Length.html

Jan de Vos
The BITMAPINFO structure in Win32 uses this for the variable amount of RGBQUADs.
Johann Gerell
Microsoft uses this method too, e.g. the BITMAPINFO structure. http://msdn.microsoft.com/en-us/library/ms532284(VS.85).aspx
Mark Ransom
Also the POSIX dirent structure contains a char[] "of unspecified size".
Steve Jessop
+1  A: 

I actually do something similar, and so does nearly every MMO or online video game ever written. Although they have a concept called a "Packet" and each packet has it's own layout. So you might have:

struct header
{
    short id;
    short size;
}

struct foo
{
    header hd;
    short hit_points;
}


short get_foo_data(char *packet)
{
    return reinterpret_cast<foo*>(packet)->hit_points;
}

void handle_packet(char *packet)
{
    header *hd = reinterpret_cast<header*>(packet);
    switch(hd->id)
    {
        case FOO_PACKET_ID:
            short val = get_foo_data(packet);
        //snip
    }
}

And they do that for the majority of their packets. Some packets obviously have dynamic sizes, and for those members they use length prefixed fields and some logic to parse that data.

Raindog
+13  A: 

It's a common problem, but what you actually want is this.

class Header
{
    int type;
    int payloadLength;
    char payload[0];

};

char* Header::GetPayload()
{
    return payload;
}
Roddy
That's probably the best solution to the original question - the way I'd do it.
Richard Harrison
This is the design pattern used by the big boys. Gets my vote.
jussij
A: 

I vote for &this[1]. I've seen it used quite a bit when parsing files that have been loaded into memory (which can equally include received packets). It may look a tad odd the first time you see it, but I think that what it means should be immediately obvious: it's clearly the address of memory just past this object. It's nice because it's hard to get wrong.

+1  A: 

I think in this day and age, in C++, the C-style cast to char* disqualifies you from any "brilliant design idea" awards without getting much of a hearing.

I might go for:

#include <stdint.h>
#include <arpa/inet.h>

class Header {
private:
    uint32_t type;
    uint32_t payloadlength;
public:
    uint32_t getType() { return ntohl(type); }
    uint32_t getPayloadLength() { return ntohl(payloadlength); }
};

class Message {
private:
    Header head;
    char payload[1]; /* or maybe std::vector<char>: see below */
public:
    uint32_t getType() { return head.getType(); }
    uint32_t getPayloadLength() { return head.getPayloadLength(); }
    const char *getPayload() { return payload; }
};

This assumes C99-ish POSIX, of course: to port to non-POSIX platforms you'd have to define one or both of uint32_t and ntohl yourself, in terms of whatever the platform does offer. It's usually not hard.

In theory you might need layout pragmas in both classes. In practice I'd be surprised given the actual fields in this case. The issue can be avoided by reading/writing the data from/to iostreams one field at a time, rather than trying to construct the bytes of the message in memory and then write it in one go. It also means you can represent the payload with something more helpful than a char[], which in turn means you won't need to have a maximum message size, or mess about with malloc and/or placement new, or whatever. Of course it introduces a bit of overhead.

Steve Jessop
A: 

I don't like to use words like "crime". I would rather point out that &this[1] seems to make assumptions about memory layout that a compiler might disagree with. For example, any compiler might, for its own reasons (like alignment), insert dummy bytes anywhere in a structure. I would prefer a technique that has more of a guarantee of getting the correct offset if compilers or options get changed.

Mike Dunlavey
A: 

In addition to the abovementioned, I'd say that this is a crime against interoperability and good wire protocol design principles. It is really surprising how many programmers are not able/willing to make a clear distinction between a wire protocol definition and its implementation. If your protocol has to survive for more than two days, it most probably has to survive for more than two years/OSes/compilers/languages/endiannesses and in some point it will break, rather sooner than later. So, make other folks' life easier, write down the wire protocol specification plus write proper (de)serialization routines. Otherwise, people will keep mentioning your name in not so pleasant contexts.

Mart Oruaas