views:

454

answers:

4

I have a series of messages that are defined by independant structs. These structs share a common header are sent between applications. I am creating a decoder that will take the raw data captures in the messages that were built using these structs and decode/parse them to some plain text.

I have over 1000 different messages that need to be decoded so I am not sure if defining all the struct formats in xml and then using xsl or some translation is the way to go or if there is a better way to do this.

There are times when I will need to decode logs containing over a million messages so performance is a concern.

Any recommendations for techniques/tools/algorithms to go about creating the decoder/parser?

struct: struct { dword messageid; dword datavalue1; dword datavalue2; } struct1;

Example raw data: 0101010A0A0A0A0F0F0F0F

Decoded message (desired output): message id: 0x01010101, datavalue1: 0x0A0A0A0A, datavalue2: 0x0F0F0F0F

I am using c++ to do this development.

A: 

Regarding "performance" - if you are using disk IO and possible display IO I doubt your parser/decoder will have much effect unless you use a truly horrible algorithm.

I am also unsure about what the problem is - Given the question right now - you have 3 DWORDs in a struct and you claim that there are over 1000 unique messages based on these values.

Your decoded message does not imply to me that you need any kind of parsing - just straight output seems to work (convert from bytes to ascii representation of a hex value)

If you do have a mapping from a value to a string, then a big switch statement is simple - or alternatively if you want to be able to have these added dynamically or change the display, then I would provide the key/value pairs (mapping) in a config file (text, xml, etc) and then do a lookup as the log file/raw data is read.

map is what I would use in that case.

Perhaps if you provide another specific example of the values and decoded output I can come up with a more appropriate suggestion.

Tim
A: 

If you have the message definitions already given in the syntax that you've used in your example, you should definitely not try to convert it manually into some other syntax (XML or otherwise).

Instead, you should try to write a compiler that takes these method definitions, and compiles them into a decoder function.

These days, the recommendation is to use ANTLR as the parser generator, using any of the ANTLR languages for the actual compiler (Java, Python, Ruby, C#, C++). That compiler then should output C code, which does the entire decoding and pretty-printing.

Martin v. Löwis
A: 

You can use yacc or antlr, add appropriate parsing rules, populate some data structure out of it(a tree may be) while parsing, then traverse the data structure and do whatever you like.

sourabh jaiswal
A: 

I'm going to assume that all you need to do is format the records and output them.

Use a custom code generator. The generated code will look something like this:

typedef struct { word messageid; } Header;

//repeated for each record type
typedef struct {
    word messageid;
    // <members here>
} Record_##;
//END


void Process(Input inp, Output out) {
    char buffer[BIG_ENOUGH];
    char *offset;

    offset = &buffer[BIG_ENOUGH];

    while(notEnd) {
        if(&offset[sizeof(LargestStruct)] >= &buffer[BIG_ENOUGH])
            // move remaining buffer to start and fill tail from inp

        Header *hpt = (Header*)offset;

        switch(hpt->messageid)
        {
            //repeated for each record type
            case <recond ID for given type>: 
            {
                Record_##* rpt = (Record_##*)offset;
                outp.format("name1: %t, ...\n", rpt->name1, ...);
                offset += sizeof(Record_##);
                break;
            }
            //END
        }
    }
}

Most of that's boiler plate so writing a program to generate it shouldn't be to hard.

If you need more processing, I think this idea could be tweaked some to make that work as well.


Edit: after re-reading the question, it looks like you might have the structs defined already. In that cases you can just #include them and use them directly. However then you end up with the issue of how to parse the structs to generate the input to the formating function. Awk or sed might be handy there.

BCS