views:

218

answers:

2

Using Windows

So I'm reading from a binary file a list of unsigned int data values. The file contains a number of datasets listed sequentially. Here's the function to read a single dataset from a char* pointing to the start of it:

function read_dataset(char* stream, t_dataset *dataset){

    //...some init, including setting dataset->size;

    for(i=0;i<dataset->size;i++){
        dataset->samples[i] = *((unsigned int *) stream);
        stream += sizeof(unsigned int);
    }
    //...
}

Where read_dataset in such a context as this:

//...
char buff[10000];
t_dataset* dataset = malloc( sizeof( *dataset) );
unsigned long offset = 0;

for(i=0;i<number_of_datasets; i++){

    fseek(fd_in, offset, SEEK_SET);

    if( (n = fread(buff, sizeof(char), sizeof(*dataset), fd_in)) != sizeof(*dataset) ){
        break;
    }

    read_dataset(buff, *dataset);

    // Do something with dataset here.  It's screwed up before this, I checked.


    offset += profileSize;
}
//...

Everything goes swimmingly until my loop reads the number 2573. All of a sudden it starts spitting out random and huge numbers.

For example, what should be

...
1831
2229
2406
2637
2609
2573
2523
2247
...

becomes

...
1831
2229
2406
2637
2609
0xDB00000A
0xC7000009
0xB2000008
...

If you think those hex numbers look suspicious, you're right. Turns out the hex values for the values that were changed are really familiar:

2573 -> 0xA0D
2523 -> 0x9DB
2247 -> 0x8C7

So apparently this number 2573 causes my stream pointer to gain a byte. This remains until the next dataset is loaded and parsed, and god forbid it contain a number 2573. I have checked a number of spots where this happens, and each one I've checked began on 2573.

I admit I'm not so talented in the world of C. What could cause this is completely and entirely opaque to me.

A: 

A couple of irrelevant points.

sizeof(*dataset) doesn't do what you think it does.

There is no need to use seek on every read

I don't understand how you are calling a function that only takes one parameter but you are giving it two (or at least I don't understand why your compiler doesn't object)

Martin Beckett
I may be mouthfooting, but I did leave some things out of my code. Each dataset contains a small header which indicates the number of samples, which isn't consistent between files. Therefore, t_dataset is the size of the biggest dataset I expect to encounter. That's probably wrong, but in my defense I didn't write that.So, fseek compensates for the overshoot.
Adam Bard
+7  A: 

You don't specify how you obtained the bytes in memory (pointed to by stream), nor what platform you're running on, but I wouldn't be surprised to find your on Windows, and you used the C stdio library call fopen(filename "r"); Try using fopen(filename, "rb");. On Windows (and MS-DOS), fopen() translates MS-DOS line endings "\r\n" (hex 0x0D 0x0A) in the file to Unix style "\n", unless you append "b" to the file mode to indicate binary.

Stephen C. Steel
Good spot (and well done - in order to make it to 15 chars!)
Martin Beckett
Thank you good sir, I understand the error of my ways (and yeah, I forgot the "b").
Adam Bard
I used to occaisonally forget the "b", and recognized the symptoms.
Stephen C. Steel