views:

88

answers:

2

Hi colleagues

I have a binary file with "messages" and I am trying to fit the bytes inside the right variable using structs. In my example I used two types of messages: Tmessage and Amessage.

#include <iostream>
#include <fstream>
#include <stdlib.h>
#include <string>
#include <iomanip>

using namespace std;

struct Tmessage
{
    unsigned short int Length;
    char MessageType;
    unsigned int Second;
};

struct Amessage
{
    unsigned short int Length;
    char MessageType;
    unsigned int Timestamp;
    unsigned long long int OrderReferenceNumber;
    char BuySellIndicator;
    unsigned int Shares;
    char Stock[6];
    unsigned int Price;
};

int main(int argc, char* argv[])
{
    const char* filename = argv[1];
    fstream file(filename, ios::in | ios::binary);
    unsigned long long int pi = 0;

    if(file.is_open()){ cout << filename << " OPENED" << endl; }
    else { cout << "FILE NOT OPENED" << endl; }

    unsigned char* memblock;
    memblock = new unsigned char[128];
    file.read((char *)memblock, 128);

    cout <<  "BINARY DATA" << endl;
    while (pi < 128)
    {
        cout << setw(2) << hex << static_cast<unsigned int>(memblock[pi]) << " ";
        pi++;
        if((pi%16)==0) cout << endl;
    }

    unsigned int poi = 0;

    Tmessage *Trecord;
    Trecord = (Tmessage *)memblock;
    cout << "Length: " << hex << (*Trecord).Length << endl;
    cout << "Message type: " << hex << (*Trecord).MessageType << endl;
    cout << "Second: " << hex << (*Trecord).Second << endl;

    poi = poi + 7; cout << endl;

    Amessage *Arecord;
    Arecord = (Amessage *)(memblock+poi);
    cout << "Length: " << hex << (*Arecord).Length << endl;
    cout << "Message type: " << hex << (*Arecord).MessageType << endl;
    cout << "Timestamp: " << hex << (*Arecord).Timestamp << endl;
    cout << "OrderReferenceNumber: " << hex << (*Arecord).OrderReferenceNumber << endl;
    cout << "BuySellIndicator: " << hex << (*Arecord).BuySellIndicator << endl;
    cout << "Shares: " << hex << (*Arecord).Shares << endl;
    cout << "Stock: " << hex << (*Arecord).Stock << endl;
    cout << "Price: " << hex << (*Arecord).Price << endl;

    delete memblock;
    file.close();
    cout << endl << "THE END" << endl;
    return 0;
}

The output when I run the program:

stream OPENED
BINARY DATA
 0  5 54  0  0 62 72  0 1c 41  0  f 42 40  0  0 
 0  0  0  4 2f 76 53  0  0  3 e8 53 50 59 20 20 
20  0 11  5 d0  0 1c 41  0  f 42 40  0  0  0  0 
 0  4 2f 78 42  0  0  3 e8 53 50 59 20 20 20  0 
10 f7 5c  0 1c 41  0  f 42 40  0  0  0  0  0  4 
2f 90 53  0  0  1 2c 53 50 59 20 20 20  0 11  2 
b0  0  5 54  0  0 62 76  0  d 44 14 25 78 80  0 
 0  0  0  0  4 2f 90  0  d 44 14 25 78 80  0  0 
Length: 500
Message type: T
Second: 726200

Length: 1c00
Message type: A
Timestamp: 40420f
OrderReferenceNumber: 53762f0400000000
BuySellIndicator: 
Shares: 20595053
Stock:   
Price: 420f0041

THE END

The program places the bytes inside the Tmessage struct correctly. (0 5 54 0 0 62 72)
However, something occurs while parses Amessage.
(0 1c 41 0 f 42 40 0 0 0 0 0 4 2f 76 53 0 0 3 e8 53 50 59 20 20 20 0 11 5 d0)

The Lenght, MessageType and Timestamp are correct but OrderReferenceNumber contains the "53" byte which belongs to BuySellIndicator and then the other variable are incorrect.

The correct A message output should be:
Length: 1c 0
Message type: 41
Timestamp: 40 42 f 0
OrderReferenceNumber: 76 2f 4 0 0 0 0 0
BuySellIndicator: 53
Shares: e8 3 0 0
Stock: 53 50 59 20 20 20
Price: d0 5 11 0

The 2 questions: a) Why the OrderReferenceNumber contains the "53" byte? b) I think that "char Stock[6]" does not work, because between Share's bytes and Price's bytes there are more than 6 bytes. How can I fit the 6 bytes into the char vector or string?

Note: I am aware that I have to swap the bytes because the binary data comes in big-endian. That is why "Stock" should not be swapped. Thank you very much for your help! Kind regards,

+3  A: 

There may be unnamed padding bytes between data members of a struct.

In order to read binary data from a file in a portable manner, you should read each member of the struct individually.

You should also use the exact width types specified in <cstdint> (Boost has an implementation of this header if your standard library doesn't have it yet); this will allow you to ensure that the sizes of your data members match the sizes of the fields in the message.

James McNellis
Compiler (alignment / noalignment) directives may also affect the padding of structures and objects, so it'd be best to do this even for a non-portable program, as otherwise it may behave differently for no readily obvious reason.
Brian Hooper
+1  A: 

The compiler is probably inserting pad bytes between members of your struct. One way you can get around this is to use pragma pack. Note that this is non-standard, but it works on g++ and visual C++.

#pragma pack (push, 1)
struct Amessage
{
    unsigned short int Length;
    char MessageType;
    unsigned int Timestamp;
    unsigned long long int OrderReferenceNumber;
    char BuySellIndicator;
    unsigned int Shares;
    char Stock[6];
    unsigned int Price;
};
#pragma pack (pop)

What's going on in the code above is: the pragma pack tells the compiler you don't want it to insert padding to make it so that it'll be performing aligned access to members of the struct. the push/pop thing is so you can have nested #pragma packs (for example, when including header files) and have a way to go back to the previously set pack options.

See MSDN for an explanation that's probably better than the one I could give. http://msdn.microsoft.com/en-us/library/2e70t5y1%28VS.80%29.aspx

George