tags:

views:

173

answers:

5

I'm trying to read data in from a binary file and then store in a data structure for later use. The issue is I don't want to have to identify exactly what type it is when I'm just reading it in and storing it. I just want to store the information regarding what type of data it is and how much data of this certain type there is (information easily obtained in the first couple bytes of this data)

But how can I read in just a certain amount of data, disregarding what type it is and still easily be able to cast (or something similar) that data into a readable form later?

My first idea would be to use characters, since all the data I will be looking at will be in byte units.

But if I did something like this:

ifstream fileStream;
fileStream.open("fileName.tiff", ios::binary);
//if I had to read in 4 bytes of data
char memory[4];
fileStream.read((char *)&memory, 4);

But how could I cast these 4 bytes if I later I wanted to read this and knew it was a double?

What's the best way to read in data of an unknown type but know size for later use? fireStream.

+1  A: 

You could copy it to the known data structure which makes life easier later on:

double x;
memcpy (&x,memory,sizeof(double));

or you could just refer to it as a cast value:

if (*((double*)(memory)) == 4.0) {
    // blah blah blah
}

I believe a char* is the best way to read it in, since the size of a char is guaranteed to be 1 unit (not necessarily a byte, but all other data types are defined in terms of that unit, so that, if sizeof(double) == 27, you know that it will fit into a char[27]). So, if you have a known size, that's the easiest way to do it.

paxdiablo
sizeof char is defined to be 1, but that may not mean 1 8-bit byte.
anon
@Pax, Thanks for deleting my comment. </sarcasm>
strager
@Pax, You shouldn't compare doubles using ==! Just thought I'd note that.
strager
@zabzonk, Yes, and? That is not relevent to this answer. If it is, please explain how/why.
strager
Sorry, @strager, I tend to delete minor fixes when I've fixed them since they no longer make sense (and they make the commenter look silly :-). I'll leave 'em all here if you wish. For the record, it was because I'd incorrectly stated "(double)", not "(double*)".
paxdiablo
@strager isn't technical accuracy relevant?
anon
@zabzonk, Ah, my mistake -- I did not notice the paragraph below the second code example.
strager
@Pax, I would have deleted the comment myself, but was surpised it disappeared on its own. =] I don't remember what I wrote in the comment, so I can't defend myself on the (double)/(double*) argument. xD
strager
@zabzonk, made technically accurate, although it didn't have a bearing on the answer since it's the ratio between sizeof(char) and sizeof(double) that matters.
paxdiablo
The code `(*(double*)(memory))` may not work in others platforms, which have alignment restrictions, ie like ARM.
Ismael
Yes, bods, there are a lot of things to look out for (alignment, wrong size, illegal doubles, comparing floats and no doubt others) but none of them have an immediate bearing on the actual question asked. Otherwise answers would all be 10,000-word essays and a lot less useful :-)
paxdiablo
+2  A: 

I think a reinterpret_cast will give you what you need. If you have a char * to the bytes you can do the following:

double * x = reinterpret_cast<double *>(dataPtr);

Check out Type Casting on cplusplus.com for a more detailed description of reinterpret_cast.

mabbit
+1  A: 

You can use structures and anonymous unions:

struct Variant
{
    size_t size;

    enum
    {
        TYPE_DOUBLE,
        TYPE_INT,
    } type;

    union
    {
        char raw[0];  // Copy to here. *

        double asDouble;
        int asInt;
    };
};

Optional: Create a table of type => size, so you can find the size given the type at runtime. This is only needed when reading.

    static unsigned char typeSizes[2] =
    {
        sizeof(double),
        sizeof(int),
    };

Usage:

Variant v;
v.type = Variant::TYPE_DOUBLE;
v.size = Variant::typeSizes[v.type];
fileStream.read(v.raw, v.size);

printf("%f\n", v.asDouble);

You will probably receive warnings about type punning. Read: Doing this is not portable and against the standard! Then again, so is reinterpret_cast, C-style casting, etc.

Note: First edit, I did not read your original question. I only had the union, not the size or type part.

*This is a neat trick I learned a long time ago. Basically, raw doesn't take up any bytes (thus doesn't increase the size of the union), but provides a pointer to a position in the union (in this case, the beginning). It's very useful when describing file structures:

struct Bitmap
{
    // Header stuff.
    uint32_t dataSize;

    RGBPixel data[0];
};

Then you can just fread the data into a Bitmap. =]

strager
+1  A: 

You could store the data in a class that provides functions to cast it to the possible result types, like this:

enum data_type {
  TYPE_DOUBLE,
  TYPE_INT
};

class data {
public:
  data_type type;
  size_t len;
  char *buffer;

  data(data_type a_type, char *a_buffer, size_t a_len)
      : type(a_type), buffer(NULL), len(a_len) {
    buffer = new char[a_len];
    memcpy(buffer, a_buffer, a_len);
  }
  ~data() {
    delete[] buffer;
  }

  double as_double() {
    assert(TYPE_DOUBLE == type);
    assert(len >= sizeof(double));
    return *reinterpret_cast<double*>(buffer);
  }

  int as_int() {...}
};

Later you would do something like this:

data d = ...;
switch (d.type) {
case TYPE_DOUBLE:
   something(d.as_double());
   break;
case TYPE_INT:
   something_else(d.as_int());
   break;
...
}

That's at least how I'm doing these kind of things :)

sth
+1  A: 

Be careful. In most environments I'm aware of, doubles are 8 bytes, not 4; reinterpret_casting memory to a double will result in junk, based on what the four bytes following memory contain. If you want a 32-bit floating point value, you probably want a float (though I should note that the C++ standard does not require that float and double be represented in any way and in particular need not be IEEE-754 compliant).

Also, your code will not be portable unless you take endianness into account in your code. I see that the TIFF format has an endianness marker in its first two bytes that should tell you whether you're reading in big-endian or little-endian values.

So I would write a function with the following prototype:

template<typename VALUE_TYPE> VALUE_TYPE convert(char* input);

If you want full portability, specialize the template and have it actually interpret the bits in input. Otherwise, you can probably get away with e.g.

template<VALUE_TYPE> VALUE_TYPE convert(char* input) {
  return reinterpret_cast<double>(input);
}
ruds