tags:

views:

305

answers:

6

#post

The names of my variables are not imporant! This code will be deleted when it works!

#post

Alright, so I'm using fread in stdio.h to read a text file. The problem is that I keep reading random bytes that don't exist in the text file from my knowledge. I'm assuming they are part of files scheme, but I just wanna make sure it's not my code.

#include "stdafx.h"
#ifdef WIN32
    #include <io.h>
#else
    #include <sys/io.h>
#endif
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>

#include "n_script_timer.h"
//using namespace std;

#ifdef _INC_WCHAR
    typedef wchar_t CHR;
#else
    typedef char CHR;
#endif
int _tmain(int argc, CHR* argv[])
{
    #ifndef _DEBUG
        if(argc == 1)
        {
            printf("You must drag a file onto this program to run it.");
            scanf("%*c");
            return 0;
        }
        CHR* fname = argv[1];
    #else
        #ifdef _INC_WCHAR
            const CHR fname[16] = L"f:\\deleteme.bin";
        #else
            const CHR fname[16] = "f:\\deleteme.bin";
        #endif
    #endif

    FILE* inFile;
    long len;
    struct Script_Timer a;
    //static const int bsize = 4096*6;
    static const int bsize = 84;
    typedef CHR chhh[bsize];
    int alen;
    printf("#Opening File '%s' ...\n",fname);
    #ifdef _INC_WCHAR
        if((inFile = _wfopen(fname,L"rb")) == NULL)
    #else
        if((inFile = fopen(fname,"r")) == NULL)
    #endif
    {
     printf("Error opening file '%s' ",fname);
     return 0;
    }
    fseek(inFile,SEEK_SET,0);
    #ifdef _WIN32
        len = _filelength( inFile->_file );
    #else
        len = _filelength(inFile->_fileno);
    #endif
    printf("  !FileLength: %d\n",len);
    printf("#Creating Buffers...\n");
    if(((float)len/(float)bsize) > (len/bsize))
    {
     alen = (len/bsize) + 1;
    }
    else alen = (len/bsize);
    #ifdef WIN32
     //chhh *cha = new chhh[alen];
     chhh cha[alen];
    #else
     chhh cha[alen];
    #endif
    printf("#Reading File...\n");
    Start_ST(&a);
    int i = 0;
    for(i=0;i<alen;++i)
    {
     fread(&cha[i],sizeof(CHR),bsize,inFile);
     printf("[%i]%s",i,cha[i]);
    }
    End_ST(&a);
    fclose(inFile);
    printf("Characters per millisecond: %f \n",((float)len/a.milliseconds));
    printf("Characters per second: %f \n",((float)len/a.milliseconds) * 1000);
    scanf("%*c");
    return 0;
}
+1  A: 

the cha buffer should be filled with null (0) before or you gonna you going to get some garbage.

printf("[%i]%s",i,cha[i]);

Like printf is outputing to screen until it meets NULL, so in the best case you are going to have some garbage, worst some access violation because you access memory that you don't own.

Note: I advise you to give meaningful name to your variable/typedef etc like chhh is not really nice. It would be a pain in few month even for you to modify such code!

RageZ
The vs compiler doesn't like assigning dynamically assigned arrays, but the GNU compiler has no problem with it? Not sure why though. It didn't however let me dynamically size a 2d array, that's why I went the long route with the typedef and whatnot.
kelton52
GCC has had that as an extension for a while now. It's also part of C99.
Nate C-K
oky, I am not really on top last C standard.
RageZ
It is true that you can't rely on it yet if you want to be portable, since some compilers haven't implemented support for it yet. I've read recommendations that you should prefer alloca instead as it has wider support and also does more error checking.
Nate C-K
+1  A: 
typedef CHR chhh[bsize];

but

fread(&cha[i], sizeof(CHR), bsize, inFile);

In C++, you need an extra byte for the '\0' at the end of a string.

Chip Uni
+6  A: 

A couple of weird things here:

int i = 0;
for(i=0;i<alen;++i)
{
   fread(&cha[i],sizeof(CHR),bsize,inFile);
   printf("[%i]%s",i,cha[i]);
}
  1. You don't null terminate the buffer before printing it (as RageZ pointed out).

  2. You increment i on each loop repetition, but every time you read 84 chars (bsize) into &cha[i]. I think this should mean you're only seeing every 84th character.

Also, if I were you I'd be checking the return value of fread every time. It's not guaranteed to always return the number of bytes you're expecting.


EDIT: The size of the block you're reading is fine. I got confused for a minute by the typedef. Every time you increment i by 1, it advances the pointer by 84*sizeof(CHR), as you intended. Still, you can't guarantee that it read the number of bytes that you think it did. If it came up short then you'll be left with junk in the buffer: say it read 60 chars, that leaves 24 junk chars before the insertion point for the next read.

Nate C-K
Why wouldn't it return the right number of bytes, do I end up losing information then, or is it picked up on the next pass?
kelton52
The file pointer is advanced, but the target buffer pointer isn't.
peterchen
You will get the subsequent characters in the next read (assuming there wasn't an error in the stream), but you won't be inserting them at the right spot in your buffer. Currently you assume that your read always gets 84 CHR's, so you always advance your pointer 84 chars ahead. Instead, check how many CHR's were read and advance by that many.
Nate C-K
+2  A: 

Note, your alen calculation is going to be wrong if you're using the wchar_t code path because bsize is the element count for the array, not its size in bytes.

I would suggest you try changing your variable names to accurately describe what they mean, you'll find it much easier to spot errors if you do.

Andy J Buchanan
Thanks, that crossed my mind, but I didn't pay it much attention.
kelton52
A: 
Thomas Matthews
A: 

Solution...Good or not?

Something else I've noticed...I'll load a file for the first time, and it'll read around 40 million characters a second, but If I open the application again, and read that same file it will read much faster...Any ideas why? Is it maybe because the OS kinda maps the file because it was used recently?

#include "stdafx.h"
#ifdef WIN32
    #include <io.h>
#else
    #include <sys/io.h>
#endif
#include <stdio.h>
#include <stdlib.h>

#include "n_script_timer.h"

#ifndef _INC_WCHAR
    typedef char _TCHAR;
#endif
int _tmain(int argc, _TCHAR* argv[])
{
    #ifndef _DEBUG
        if(argc == 1)
        {
            printf("You must drag a file onto this program to run it.");
            scanf("%*c");
            return 0;
        }
        _TCHAR* fname = argv[1];
    #else
        #ifdef _INC_WCHAR
            const _TCHAR fname[16] = L"f:\\deleteme.bin";
        #else
            const _TCHAR fname[16] = "f:\\deleteme.bin";
        #endif
    #endif
    struct Script_Timer a;
    FILE *inFile;
    long pos = 0;
    long len = 0;
    char *buffer;
    printf("#Opening File '%s' ...\n",fname);
    #ifdef _INC_WCHAR
        if((inFile = _wfopen(fname,L"rb")) == NULL)
        {
      printf("#File doesn't exist...\n");
      return 0;
     }
    #else
        if((inFile = fopen(fname,"r")) == NULL)
        {
      printf("#File doesn't exist...\n");
      return 0;
     }
    #endif
    printf("#Setting File position...\n");
    fseek(inFile, 0, SEEK_END);
    pos = ftell(inFile);
    fseek(inFile, 0, SEEK_SET);
    printf("#Getting File Length...\n");
    #ifdef _WIN32
        len = _filelength( inFile->_file );
    #else
        len = _filelength(inFile->_fileno);
    #endif  
    printf("#Sizing Buffer...\n");
    buffer = (char*)malloc(pos);
    if(buffer == NULL)
    {
     printf("Not enough free memory to load the file...\n");
     scanf("%*c");
     return 0;
    }
    printf("#Starting Read...\n");
    Start_ST(&a);
    fread(buffer, pos, 1, inFile);
    End_ST(&a);
    printf("#Read Ended...\n");
    fclose(inFile);
    printf("Characters per millisecond: %f \n",((float)len/a.milliseconds));
    printf("Characters per second: %f \n",((float)len/a.milliseconds) * 1000);
    scanf("%*c");
    free(buffer);
    return 0;
}
kelton52
Yes, that's right. The OS has retained the file (or part of it, at least) in cache, making subsequent reads much faster than the first one.
Nate C-K