tags:

views:

38

answers:

1

I have binary data with a mix of uint32 and null terminated strings. I know the size of an individual data set ( each set of data shares the same format ), but not the actual format.

I've been using unpack to read the data with the following functions:

function read_uint32( $fh ){
  $return_value = fread($fh, 4 );
  $return_value = unpack( 'L', $return_value );
  return $return_value[1];
}

function read_string( $fh ){
  do{
    $char = fread( $fh, 1 );
    $return_string .= $char;
  }while( ord( $char ) != 0 );
  return substr($return_string, 0, -1);
}

and then basically trying both functions and seeing if the data makes sense as a string, and if not it's probably an int, is there an easier way to go about doing this?

Thanks.

+1  A: 

well i think your approcah is okay. well if you get only ascii strings its quite easy as the hightest bit will always be 0 or 1 (in some strange cases...) analyzing some bytes from the file and then look at the distribution will tell you probably whether its ascii or something binary. if you have a different encoding like utf8 or something its really a pain in the ass. you could probablly look for recurring CR/LF chars or filter out the raing 0-31 to only let tab, cr, lf, ff slip trhough. when you analyze the first X bytes and compare the ratio of non tab,cr,lf,ff chars and others. this will work for any encoding as the ascii range is normed... to define the actual filetype its probably best to let this to the os layer and simply call file from the shell or use the php functions to get the mimetype...

Joe Hopfgartner