views:

60

answers:

2

i have several files whose content need to be merged into a single file. i have the following code that does this ... but it seems rather inefficient in terms of memory usage ... would you suggest a better way to do it ?

the Util.MoveFile function simply accounts for moving files across volumes

   private void Compose(string[] files)
   {
       string inFile = "";
       string outFile = "c:\final.txt";

       using (FileStream fsOut = new FileStream(outFile + ".tmp", FileMode.Create))
       {
           foreach (string inFile in files)
           {
               if (!File.Exists(inFile))
               {
                   continue;
               }

               byte[] bytes;
               using (FileStream fsIn = new FileStream(inFile, FileMode.Open))
               {
                   bytes = new byte[fsIn.Length];
                   fsIn.Read(bytes, 0, bytes.Length);
               }

               //using (StreamReader sr = new StreamReader(inFile))
               //{
               //    text = sr.ReadToEnd();
               //}

               // write the segment to final file
               fsOut.Write(bytes, 0, bytes.Length);

               File.Delete(inFile);
           }
       }

       Util.MoveFile(outFile + ".tmp", outFile);

}

+1  A: 

Sometimes its just better to call shell function than to reimplement functionality. As Alan says you can use CAT on unix systems or perhaps on windows you can use the built in command processor

copy file1+file2+file3 concated_file
Preet Sangha
thanks preet sangha ... you're right as well ... there's an overhead in creating the new shell process/thread but ultimately it should be more efficient than anything i could possibly do .. i will try to implement and profile
lboregard
The important thing is what you said last - profile!
Preet Sangha
A: 

You can use smaller fixed-size buffers like so:

byte[] bytes = new byte[8192]; // adjust this as needed
int bytesRead;
do {
    bytesRead = fsIn.Read(bytes, 0, bytes.Length);
    fsOut.Write(bytes, 0, bytesRead);
} while (bytesRead > 0);

This is pretty self explanatory except for during the last block so basically what's happening is I'm passing an 8K byte array to the Read method which returns the number of bytes it actually read. So on the Write call, I am passing that value which is somewhere between 0 and 8192. In other words, on the last block, even though I am passing a byte array of 8192 bytes, bytesRead might only be 10 in which case only the first 10 bytes need to be written.

EDIT

I edited my answer to do this in a slightly different way. Instead of using the input file's position to determine when to break out of the loop, I am checking to see if bytesRead is greater than zero. This method works with any kind of stream to stream copy including streams that don't have a fixed or known length.

Josh Einstein
thanks Josh ... i will implement this and profile the results
lboregard
Note that like anything, there's always a tradeoff between performance and memory usage. The fastest way to do this is the way you originally showed it... using as much memory as required. Also note that unless you use specific constructors, the default buffering done by .NET will still take place. And the file system will still apply its own write buffering. But at least this allows you to process huge files without fear of OutOfMemoryException.
Josh Einstein
I would use bigger buffer, like 1Mb for typical disk. The back of an envelop way to calculate buffer size you need to use disk efficiently for non-SSD disk: multiply disk seek time by throughput. For modern disks, you get (~10ms)*(~100Mb/sec) = 1MB. 8KB might noticeably slow you down.
Michael
Completely agree with Michael that chunking 8K at a time you will likely cause bottlenecks. I should have clarified that it was just an example number but the best buffer size will depend on factors such as how many of these you'll be doing at a time (is it a web server?) and the average size of the file.
Josh Einstein