views:

256

answers:

2

Is there a way to deserialize or marshal or somehow parse a byte array back into a structure when you don't know what that structure was in the first place? The structure probably came from C++.

Some background: I have a flight simulator for R/C planes and I'm trying to figure out if I can automate it. There is no API. I know how to automate the input. I'm trying to get at the output of the program. (flight dynamics of the plane, etc)

The simulator has a multi-player function so I know that it has to pass the exact info I'm looking for over the network. It's built on DirectX 9 and uses DirectPlay (deprecated gaming network protocol) for multi-player communication. My guess is the simulator itself is written in C++.

So, I can actually connect to the program and have gotten a message with 13 bytes. Great. Now what.

In general, how would one reverse-engineer something like this?

A: 

First of all, there are probably around 20+ various messages, and you will need to mimic the original client / server. You could write a proxy between server and client and catch all legal packets that are sent between those 2.
Second of all, if you would like to reverse such stuff, you could disassemble your simulator, and try to find where those packets are filled, there might be some hints there. Another thing you could do is to get many packets of the same type and analyse them (maybe when your plane has changed some axis, only 2 or 4 bytes has changed, which would indicate, that this field is responsible for changing this axis - this kind of stuff).
And before you will start playing with it, think:
1. Do you really want to go into this, this isn't going to be easy.
2. Are you sure, that nobody did this before, some protocols are discovered and published by fans (or fanatics?); an example - Ultima Online, about 10 emulators can be found, that implement most of original's server features, there are many guides on how the protocol is implemented etc.

Ravadre
I was thinking the same thing: do a bunch of test flights and capture the data with a packet sniffer.
Patrick
+1  A: 

If you can access the struct's address, you can at least grab a byte-dump of what's in there for a start. Here's the 5-minute hack I made:

#include <stdio.h>

typedef struct {
    char c1;
    char c2;
    int i;
    float f;
    char *str;
} unknown;

void decode(unsigned char *address, int len) {
    unsigned char *p = address;
    for (; p < address + len ; p++) {
       printf("Byte offset: %p\tByte: 0x%02X\tAscii: %c\n", p - address, *p, *p);
    }
}

int main() {
    unknown x;
    int len = sizeof(unknown); /* or 13 like you've said the size is */

    /* this would happen in whatever software 
       you're using to generate the struct */
    x.c1 = 'h';
    x.c2 = 'i';
    x.i = 25;
    x.f = 3.14;
    x.str = "Hello";

    printf("first x:\n");
    decode((unsigned char*)(&x), len);

    x.c1 = 'o';
    x.c2 = 'l';
    x.i = 255;
    x.f = -9;
    x.str = "Goodbye";

    printf("second x:\n");
    decode((unsigned char*)(&x), len);

    return 0;
}

And here's the output:

first x:
Byte offset: (nil)  Byte: 0x68 Ascii: h
Byte offset: 0x1    Byte: 0x69 Ascii: i
Byte offset: 0x2    Byte: 0xF3 Ascii: 
Byte offset: 0x3    Byte: 0xB7 Ascii: �
Byte offset: 0x4    Byte: 0x19 Ascii: 
Byte offset: 0x5    Byte: 0x00 Ascii: 
Byte offset: 0x6    Byte: 0x00 Ascii: 
Byte offset: 0x7    Byte: 0x00 Ascii: 
Byte offset: 0x8    Byte: 0xC3 Ascii: 
Byte offset: 0x9    Byte: 0xF5 Ascii: 
Byte offset: 0xa    Byte: 0x48 Ascii: H
Byte offset: 0xb    Byte: 0x40 Ascii: @
Byte offset: 0xc    Byte: 0xD8 Ascii: 
Byte offset: 0xd    Byte: 0x85 Ascii: �
Byte offset: 0xe    Byte: 0x04 Ascii: 
Byte offset: 0xf    Byte: 0x08 Ascii: 
second x:
Byte offset: (nil)  Byte: 0x6F Ascii: o
Byte offset: 0x1    Byte: 0x6C Ascii: l
Byte offset: 0x2    Byte: 0xF3 Ascii: 
Byte offset: 0x3    Byte: 0xB7 Ascii: �
Byte offset: 0x4    Byte: 0xFF Ascii: �
Byte offset: 0x5    Byte: 0x00 Ascii: 
Byte offset: 0x6    Byte: 0x00 Ascii: 
Byte offset: 0x7    Byte: 0x00 Ascii: 
Byte offset: 0x8    Byte: 0x00 Ascii: 
Byte offset: 0x9    Byte: 0x00 Ascii: 
Byte offset: 0xa    Byte: 0x10 Ascii: 
Byte offset: 0xb    Byte: 0xC1 Ascii: 
Byte offset: 0xc    Byte: 0xE7 Ascii: 
Byte offset: 0xd    Byte: 0x85 Ascii: �
Byte offset: 0xe    Byte: 0x04 Ascii: 
Byte offset: 0xf    Byte: 0x08 Ascii: 

The assumption I'm using is that you know what the inputs are to the data, you just don't know what the layout is, or necessarily what is contained in it.

This is pretty difficult even when we do know what's in there. The chars are the easiest to decode, clearly. We can see we went from 'hi' to 'ol' right at the beginning.

Next is the int, changing from 25 to 255. We can see the two values of 0x19 and 0xff at offset 0x4, but where are the rest of the bytes? Is it 0x5-0x7 (suggesting the int is stored "backwards")? It might be, and maybe offset 0x2-0x3 is just padding for the one-byte chars we used (C structs have some alignment according with word size).

Then there's a float -- I really don't know how floats are encoded internally, so I'm not even going to try deducing the difference there. You can probably look up the IEEE standard for that.

Finally, we close with a pointer. If there are pointers in that struct, then you'll have to try and look up those memory addresses without segfaulting your program. They might be pointers to other structs, in which case you'll have the joy of repeating this process.

Like I said, this was my 5-minute take at it, and I have never actually tried this out before. It was mainly my first guess at how you'd go about this -- start with known inputs, then change one thing at a time until you can determine the data type stored in the struct and the corresponding byte offset.

Mark Rushakoff
Very helpful! Thank you.
Patrick