tags:

views:

434

answers:

8

In another question, the accepted answer shows a method for reading the contents of a file into memory.

I have been trying to use this method to read in the content of a text file and then copy it to a new file. When I write the contents of the buffer to the new file, however, there is always some extra garbage at the end of the file. Here is an example of my code:

inputFile = fopen("D:\\input.txt", "r");
outputFile = fopen("D:\\output.txt", "w");

if(inputFile)
{
 //Get size of inputFile
 fseek(inputFile, 0, SEEK_END);
 inputFileLength = ftell(inputFile);
 fseek(inputFile, 0, SEEK_SET);

 //Allocate memory for inputBuffer 
 inputBuffer = malloc(inputFileLength);

 if(inputBuffer)
 {
  fread (inputBuffer, 1, inputFileLength, inputFile);
 }

 fclose(inputFile);

 if(inputBuffer)
 {
  fprintf(outputFile, "%s", inputBuffer);
 }

 //Cleanup
 free(inputBuffer);
 fclose(outputFile);
}

The output file always contains an exact copy of the input file, but then has the text "MPUTERNAM2" appended to the end. Can anyone shed some light as to why this might be happening?

+4  A: 

You haven't allocated enough space for the terminating null character in your buffer (and you also forget to actually set it), so your fprintf is effectively overreading into some other memory. Your buffer is exactly the same size as the file, and is filled with its content, however, fprintf reads the parameter looking for the terminating null, which isn't there, until a couple of characters later where, coincidently, there is one.

EDIT
You're actually mixing two types of io, fread (which is paired with fwrite) and fprintf (which is paired with fscanf). You should probably be doing fwrite with the number of bytes to write; or conversely, use fscanf, which would null-terminate your string (although, this wouldn't allow nulls in your string).

roe
+1  A: 

fprintf expects inputBuffer to be null-terminated, which it isn't. So it's reading past the end of inputBuffer and printing whatever's there (into your new file) until it finds a null character.

In this case you could malloc an extra byte and put a null as the last character in inputBuffer.

Francisco Canedo
+4  A: 

Because you are writing the buffer as if it were a string. Strings end with a NULL, the file you read does not.

You could NULL terminate your string, but a better solution is to use fwrite() instead of fprintf(). This would also let you copy files that contain NULL characters.

Unless you know the input file will always be small, you might consider reading/writing in a loop so that you can copy files larger than memory.

+1 for the suggestion to read/write in a loop.
j_random_hacker
+1  A: 

In addition to what other's have said: You should also open your files in binary-mode - otherwise, you might get unexpected results on Windows (or other non-POSIX systems).

Christoph
+7  A: 

You may be happier with

int numBytesRead = 0;
if(inputBuffer)
{
  numBytesRead = fread (inputBuffer, 1, inputFileLength, inputFile);
}

fclose(inputFile);

if(inputBuffer)
{
  fwrite( inputBuffer, 1, numBytesRead, outputFile );
}

It doesn't need a null-terminated string (and therefore will work properly on binary data containing zeroes)

Crashworks
I hadn't realized that fread returns the number of elements read. This is much cleaner than my original code. Thanks!
Tim
You could even do something kooky like fwrite( inputBuffer, 1, fread(inputBuffer, 1, inputFileLength, inputFile), outputFile );but that may alarm your colleagues. ;)
Crashworks
Tim, the main point is using fwrite() instead of fprintf(). It's good form to save numBytesRead and check that it equals inputFileLength, but that's less important.
j_random_hacker
You should only use this "read the whole file into memory" solution if the files are small. Multi-gig files won't fit at all and you should be keeping your resource requirements relatively low so as to get along nicely with other programs.
paxdiablo
+1  A: 

You can use

fwrite (inputBuffer , 1 , inputFileLength , outputFile );

instead of fprintf, to avoid the zero-terminated string problem. It also "matches better" with fread :)

Federico Ramponi
A: 

Try using fgets instead, it will add the null for you at the end of the string. Also as was said above you need one more space for the null terminator.

ie

The string "Davy" is represented as the array that contains D,a,v,y,\0 (without the commas). Basically your array needs to be at least sizeofstring + 1 to hold the null terminator. Also fread will not automatically add the terminator, which is why even if your file is way shorter than the maximum length you get garbage..

Note an alternative method for being lazy is just to use calloc which sets the string to 0. But still you should only fread inputFileLength-1 characters at most.

Cervo
+2  A: 

Allocating memory to fit the file is actually quite a bad way of doing it, especially the way it's done here. If the malloc() fails, no data is written to the output file (and it fails silently). In other words, you can't copy files greater than a few gigabytes on a 32-bit platform due to the address space limitations.

It's actually far better to use a smaller memory chunk (allocated or on the stack) and read/write the file in chunks. The reads and writes will be buffered anyway and, as long as you make the chunks relatively large, the overhead of function calls to the C runtime libraries is minimal.

You should always copy files in binary mode as well, it's faster since there's no chance of translation.

Something like:

FILE *fin = fopen ("infile","rb");  // make sure you check these for NULL return
FILE *fout = fopen ("outfile","wb");
char buff[1000000];  // or malloc/check-null if you don't have much stack space.
while ((count = fread (buff, 1, sizeof(buff), fin)) > 0) {
    // Check count == -1 and errno here.
    fwrite (buff, 1, count, fout); // and check return value.
}
fclose (fout);
fclose (fin);

This is from memory but provides the general idea of how to do it. And you should always have copiuos error checking.

paxdiablo