views:

183

answers:

7

Hello everyone,

Every time i try to read a file form the hard drive and cast the data into a structure, i end up with problems of the data not casting properly. Is there a requirement with the reinterpret_cast() function that requires the number of bytes in a structure be a multiple of 4 bytes? If not, what am I doing wrong? If so, how do i get around that?

my structure looks like this: (they are in 50 byte chunks)

class stlFormat
{
public:

    float normalX, normalY, normalZ;
    float x1,y1,z1;
    float x2,y2,z2;
    float x3,y3,z3;

    char byte1, byte2;
};

Rest of my code:

void main()
{

int size;
int numTriangles;

int * header = new int [21]; // size of header

ifstream stlFile ("tetrahedron binary.STL", ios::in|ios::binary|ios::ate);

size = stlFile.tellg(); // get the size of file

stlFile.seekg(0, ios::beg); //read the number of triangles in the file
stlFile.read(reinterpret_cast<char*>(header), 84);

numTriangles = header[20];

stlFormat * triangles = new stlFormat [numTriangles]; //create data array to hold vertex data

stlFile.seekg (84, ios::beg); //read vertex data and put them into data array
stlFile.read(reinterpret_cast<char*>(triangles), (numTriangles * 50));

cout << "number of triangles: " << numTriangles << endl << endl;

for (int i = 0; i < numTriangles; i++)
{
    cout << "triangle " << i + 1 << endl;
    cout << triangles[i].normalX << " " << triangles[i].normalY << " " << triangles[i].normalZ << endl;
    cout << triangles[i].x1 << " " << triangles[i].y1 << " " << triangles[i].z1 << endl;
    cout << triangles[i].x2 << " " << triangles[i].y2 << " " << triangles[i].z2 << endl;
    cout << triangles[i].x3 << " " << triangles[i].z3 << " " << triangles[i].z3 << endl << endl;  
}

stlFile.close();
getchar();
}

Just for you John, although its rather incomprehensible. Its in hex format.

73 6f 6c 69 64 20 50 61 72 74 33 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 04 00 00 00 ec 05 51 bf ab aa aa 3e ef 5b f1 be 00 00 00 00 00 00 00 00 f3 f9 2f 42 33 33 cb 41 80 e9 25 42 9a a2 ea 41 33 33 cb 41 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ab aa aa 3e ef 5b 71 3f 33 33 4b 42 00 00 00 00 f3 f9 2f 42 33 33 cb 41 80 e9 25 42 9a a2 ea 41 00 00 00 00 00 00 00 00 f3 f9 2f 42 00 00 ec 05 51 3f ab aa aa 3e ef 5b f1 be 33 33 cb 41 00 00 00 00 00 00 00 00 33 33 cb 41 80 e9 25 42 9a a2 ea 41 33 33 4b 42 00 00 00 00 f3 f9 2f 42 00 00 00 00 00 00 00 00 80 bf 00 00 00 00 33 33 cb 41 00 00 00 00 00 00 00 00 33 33 4b 42 00 00 00 00 f3 f9 2f 42 00 00 00 00 00 00 00 00 f3 f9 2f 42 00 00

+3  A: 

Most likely, float has an alignment of four bytes on your system. This means that, because you use it in your structure, the compiler will make sure the start of the structure when allocated using normal methods will always be a multiple of four bytes. Since the raw size of your structure is 4*12+2 = 50 bytes, it needs to be rounded up to the next multiple of four bytes - otherwise, the second element of arrays of this structure would be unaligned. So your struct ends up 52 bytes, throwing off your parsing.

If you need to parse a binary format, it's often a good idea to either use compiler-specific directives to disable alignment, or read one field at a time, to avoid these problems.

For example, on MSVC++, you can use __declspec(align(1)) Edit: Actually __declspec(align(X)) can only increase alignment restrictions. Oops. You'll need to either load one field at a time, or make the padding part of the binary format.

bdonlan
Yea, i kinda figured that, but is there any way of fixing it? or will i have to manually loop through the dataset and manually offset and read data into each individual...err...chunk of class?
Faken
Depends on the compiler. What compiler are you using?
bdonlan
Im using visual C++ 2008
Faken
Added a demonstration (untested) of how to disable alignment on MSVC++ -- Actually, never mind, that doesn't work. Look at John Dibling's answer.
bdonlan
Just curious, what exactly do you mean by it will hurt performance? Do you mean a performance hit when being read from HD into the class or a performance hit when actuals preforming operations with the data in the class?
Faken
Performance hit when performing operations on the floats that happen to not be aligned to a 4-byte boundrary, or worse, those which strattle a cache line.
bdonlan
Hmm...thats a problem...Ok, thank you, I think ill manually read the data into the class by using a loop and reading offsets. Fast operations on the floats is critical. Thanks for your help, this is voted best answer due to the comments.
Faken
+2  A: 

instead of fiddling with padding and differences between platforms, maybe have a look at serialization to/from binary files? It might be somewhat less performant then reading data straight into memory, but it's way more extensible.

stijn
+3  A: 

I used my favorite text editor (editpadpro) to save the file you posted in the OP as a binary file called "c:\work\test.bin", edited your code to the following, and it (apparently) produced the correct (expected) output. Please try it out.

#include <cstdlib>
#include <iostream>
#include <fstream>
using namespace std;

#pragma pack( push, 1 )
class stlFormat
{
public:

    float normalX, normalY, normalZ;
    float x1,y1,z1;
    float x2,y2,z2;
    float x3,y3,z3;

    char byte1, byte2;
};
#pragma pack( pop ) 


struct foo
{
    char c, d, e;
};

void main()
{

    size_t sz = sizeof(foo);

int size;
int numTriangles;

int * header = new int [21]; // size of header

ifstream stlFile ("c:\\work\\test.bin", ios::in|ios::binary|ios::ate);

size = stlFile.tellg(); // get the size of file

stlFile.seekg(0, ios::beg); //read the number of triangles in the file
stlFile.read(reinterpret_cast<char*>(header), 84);

numTriangles = header[20];

stlFormat * triangles = new stlFormat [numTriangles]; //create data array to hold vertex data

stlFile.seekg (84, ios::beg); //read vertex data and put them into data array
stlFile.read(reinterpret_cast<char*>(triangles), (numTriangles * 50));

cout << "number of triangles: " << numTriangles << endl << endl;

for (int i = 0; i < numTriangles; i++)
{
    cout << "triangle " << i + 1 << endl;
    cout << triangles[i].normalX << "   " << triangles[i].normalY << "  " << triangles[i].normalZ << endl;
    cout << triangles[i].x1 << "    " << triangles[i].y1 << "   " << triangles[i].z1 << endl;
    cout << triangles[i].x2 << "    " << triangles[i].y2 << "   " << triangles[i].z2 << endl;
    cout << triangles[i].x3 << "    " << triangles[i].z3 << "   " << triangles[i].z3 << endl << endl;       
}

stlFile.close();
getchar();
}
John Dibling
It should be noted that this can reduce performance in some cases.
bdonlan
@ bdonlan: Please elaborate.
John Dibling
@ bdonlan: Will it be slower just reading the file (which is very minor to me) or slower when preforming operations on the data in the class (which is very serious to me)
Faken
From MSDN's page on pragma pack: "Note that if you change the alignment of a structure, the structure will not use as much space in memory, but you may see a decrease in performance or even get a hardware-generated exception for unaligned access." x86 won't actually exception, but you may see some degree of performance hit when accessing those floats (not just when reading the file).
bdonlan
No no, the first triangle will always come out correct (becuase im offsetting the read by the header length and then reading the first chunk of data based off that offset. However, after it parses the first offset, the second one will be messed up (on my computer it does exactly that, any sucessive triangle will be messed up after the first one). Look at the program output, the values for the other triangles (they are x,y,z coordinates) are unreasonable.
Faken
Sorry i must leave, but thank you for your help.
Faken
+1  A: 

You should be aware that you are throwing portability out the window with that kind of code: your files may be incompatible with new versions of your program if you compile with a different compiler or for a different system.

That said, you might fix this by using sizeof( int[21] ) and sizeof( stlFormat[ numTriangles ] ) rather than hardcoded sizes in bytes. Reason being, as others noted, the alignment bytes your compiler may or may not add.

If this is a program that other people may use or files might be shared, look up serialization.

Potatoswatter
STL is a standard file format with a fixed number of bytes per triangle; he can't change that.
Darryl
Oh, I thought he meant Standard Template Library.
Potatoswatter
I know i have compatibility issues but since I'm a mechanical engineering student and not a computer science student, just getting this all to work is a feat on its own! If it works, even on only one computer, I'm happy! (I have, however, taken steps to ensure that all the computers i work off of run intel processors and run the same OS)
Faken
A: 

I think the problem is not so much the reading of each individual triangle as that the triangle array isn't laid out as you think. There appear to be 50 bytes in each struct, but the allocated memory is almost certainly laid out as if the structs were 52 bytes. Consider reading in each struct individually.

Two more points:

First, there is no such thing as void main in C++. Use int main().

Second, you seem to be leaking memory. You'd be better off in general using the vector facility.

David Thornley
The problem seems to be in the parsing, not in the layout. From what i can figure, when i send the read command, the program yanks the entire stream of data off my hard drive and dumps it into a contagious chunk of memory without doing anything to it. It then lays down an accessing scheme like a cookie cutter on the data in the form of my structure. The problem is that the cookie cutter is 52 bytes as opposed to 50, due to keeping everything nice and even with the way the CPU accesses memory. my problem is in parsing. BTW, void main() is valid in visual C++ (but yes, it is non standard)
Faken
I would call that a layout problem rather than a parsing problem. If you read each 50 bytes into a triangle, I'd expect it to work, which is why I suggested reading the structs individually.
David Thornley
+1  A: 

IMO you really ought to be explicitly reading the triangles directly (deserialization) instead of casting bytes. Doing so will help you avoid portability and performance problems. If you're doing a lot of calculations with those triangles after you read them, the performance hit for using a non-standard memory layout can be non-trivial.

Replace the line "stlFile.read(reinterpret_cast(triangles), (numTriangles * 50));" with this:

for (int i = 0; i < numTriangles; i++)
{
  stlFile.read((char*)&triangles[i].normalX, sizeof(float));
  stlFile.read((char*)&triangles[i].normalY, sizeof(float));
  stlFile.read((char*)&triangles[i].normalZ, sizeof(float));
  stlFile.read((char*)&triangles[i].x1, sizeof(float));
  stlFile.read((char*)&triangles[i].y1, sizeof(float));
  stlFile.read((char*)&triangles[i].z1, sizeof(float));
  stlFile.read((char*)&triangles[i].x2, sizeof(float));
  stlFile.read((char*)&triangles[i].y2, sizeof(float));
  stlFile.read((char*)&triangles[i].z2, sizeof(float));
  stlFile.read((char*)&triangles[i].x3, sizeof(float));
  stlFile.read((char*)&triangles[i].y3, sizeof(float));
  stlFile.read((char*)&triangles[i].z3, sizeof(float));
  stlFile.read(&triangles[i].byte1, 1);
  stlFile.read(&triangles[i].byte2, 1);
}

It takes a little more code and a little more time to read in the triangles, but you'll avoid a few potential headaches.

Note that writing triangles also requires similar code to avoid inadvertently writing out some padding.

Darryl
A: 

Storing a struct entirely at once isn't portable unless you take great care with compiler-specific flags and all compilers and architectures might still not allow the same binary format. Storing a field (e.g. a floating-point number) at a time is better, but still isn't portable because of endianess issues and possibly different data types (e.g. what is sizeof(long) on your system).

In order to save integers safely and portably, you have to format them byte at a time into a char buffer that will then be written out to a file. E.g.

char buf[100];  // Extra space for more values (instead of only 4 bytes)
// Write a 32 bit integer value into buf, using big endian order
buf[0] = value >> 24;  // The most significant byte
buf[1] = value >> 16;
buf[2] = value >> 8;
buf[3] = value;  // The least significant byte

Similarly, reading back has to be done a byte at a time:

// Converting the pointer to unsigned to avoid sign extension issues
unsigned char* ubuf = reinterpret_cast<unsigned char*>(buf);
value = ubuf[0] << 24 | ubuf[1] << 16 | ubuf[2] << 8 | ubuf[3];

If little endian order is desired, invert the indexing order of buf and ubuf.

Because no pointer casting of integer types to char or vice-versa are done, the code is fully portable. Doing the same for floating-point types requires extra caution and a pointer cast so that the value can be handled as an integer, so that bit shifting works. I won't cover that in detail here.

While this solution seems extremely painful to use, you only need to write a few helper functions to make it tolerable. Alternatively, especially if the exact format used does not matter to you, you can use an existing serialization library. Boost.Serialization is a rather nice library for that.

Tronic