views:

107

answers:

2

Hey

I am parsing a binary file using a specification. The file comes in big-endian mode because it has streamed packets accumulated. I have to reverse the length of the packets in order to "reinterpret_cast" them into the right variable type. (I am not able to use net/inet.h function because the packets has different lengths).

The read() method of the ifstream class puts the bytes inside an array of chart pointers. I tried to do the reversion by hand using a but I cannot figure out how to pass the "list of pointers" in order to change their position in the array.

If someone knows a more efficent way to do so, please let me know (8gb of data needs to be parse).

#include <iostream>
#include <fstream>

void reverse(char &array[]);

using namespace std;

int main ()
{
    char *a[5];
    *a[0]='a'; *a[1]='b'; *a[2]='c'; *a[3]='d'; *a[4]='e';

    reverse(a);

    int i=0;
    while(i<=4)
    {
        cout << *a[i] << endl;
        i++;
    }
    return 0;
}
void reverse(char &array[])
{
    int size = sizeof(array[])+1;
    //int size = 5;
    cout << "ARRAY SIZE: " << size << endl;

    char aux;
    for (int i=0;i<size/2;i++)
    {
            aux=array[i];
            array[i]=array[size-i-1];
            array[size-i-1]=aux;
    }
}

Thanks all of you for your help!

A: 

Ok, after your comment I understand what you are after. So you need to change endianness of a field that is 6 bytes wide.

I think this article should help you as well as this question on SO, it shows how to implement conversions in different ways, the fastest being a bitwise implementation. It shows no implementation for a six byte wide field, but an analogous solution can easily be made.

I suggest copying your length field in a 64bit integer and then implementing a custom function to swap the relevant 6 bytes. Get rid or all the char pointers in any case...;)

If you are compiling on VC++ there is this function: _byteswap_uint64. Past your 6 bytes in the high end of this uint64, call this function and hopla, you are done.

edit at 4:12 am (I must be getting very addicted to stackoverflow)

#include <iostream>
#include <stdlib.h>

typedef unsigned char    byte;
typedef unsigned __int64 uint64_t; // uncomment if you are not on VC++

// in case you are not compiling with VC++ use this custom function
// It can swap data of any size. Adapted from:
// http://stackoverflow.com/questions/2182002/convert-big-endian-to-little-endian-in-c-without-using-provided-func/2182581#2182581
// see: http://en.wikipedia.org/wiki/XOR_swap_algorithm

void
swapBytes( void* v, size_t n )
{
   byte* in = (byte*) v;

   for( size_t lo=0, hi=n-1; hi>lo; ++lo, --hi )

      in[lo] ^= in[hi]
   ,  in[hi] ^= in[lo]
   ,  in[lo] ^= in[hi] ;
}

#define SWAP(x) swapBytes( &x, sizeof(x) );


int
main()
{
   // pointer to location of length field. 
   // You will have to read it from file to memory.
   byte length[6] = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06 };

   // ok, you have read it from file, now get it in an uint64_t
   uint64_t i = *( (uint64_t*)  length );

   i <<= 16; // zero two bytes and move everything to the high end.

   std::cout << std::hex << i                     << std::endl;
   std::cout << std::hex << _byteswap_uint64( i ) << std::endl;

   // generic swapping function
   SWAP( i ) 
   std::cout << std::hex << i                     << std::endl;

   std::cin.get();
   return 0;
}

// Outputs:
// 605040302010000
// 10203040506
// 10203040506
ufotds
Thanks for the answers and the code! I found this article that may be interesting: http://blogs.sun.com/DanX/entry/optimizing_byte_swapping_for_fun
emerrf
+2  A: 

Not quite.

The file comes in big-endian mode because it has streamed packets accumulated. I have to reverse the length of the packets in order to "reinterpret_cast" them into the right variable type.

You need to reverse the bytes on the level of stored data, not the file and not the packets.

For example, if a file stores a struct.

struct S {
  int i;
  double d;
  char c;
};

to read the struct you will need to reverse:

int: [4321]->[1234]  // sizeof(int) == 4, swap the order of 4 bytes
double: [87654321]->[12345678]  // sizeof(double) == 8, swap the order of 8 bytes
char: [1]->[1]  // sizeof(char) == 1, swap 1 byte (no swapping needed)

Not the entire struct at once.

Unfortunately, it's not as trivial as just reversing the block of data in the file, or the file itself. You need to know exactly what data type is being stored, and reverse the bytes in it.

The functions in inet.h are used for exactly this purpose, so I encourage you to use them.

So, that brings us to c strings. If you're storing c strings in a file, do you need to swap their endianness? Well, a c string is a sequence of 1 byte chars. You don't need to swap 1 byte chars, so you don't need to swap the data in a c string!

If you really want to swap 6 bytes, you can use the std::reverse function:

char in[6] = get6bytes();
cout << in << endl;  // shows abcdef 
std::reverse(in, in+6);
cout << in << endl;  // shows fedcba

If you're doing this on any large scale (a large amount of types), then you may want to consider writing a code generator that generates these byte swapping functions (and file reading functions), it's not too hard, as long as you can find a tool to parse the structs in c (I've used gcc-xml for this, or maybe clang would help).

This makes serialization a harder problem. If it's in your power, you may want to consider using XML or Google's protocol buffers to solve these problems for you.

Stephen
this is a pretty good explanation. As far as I understand it, he/she only needs to convert the length field.
ufotds
yes, the std::reverse would definitely be the recommended way.
ufotds