ansaurus

Question

How to read multiple text files and save them into one text file?

Answer 1

+7 A:

New answer

(See explanation for junking original answer below.)

static void CopyFiles(string dest, params string[] sources)
{
    using (TextWriter writer = File.CreateText(dest))
    {
        // Somewhat arbitrary limit, but it won't go on the large object heap
        char[] buffer = new char[16 * 1024]; 
        foreach (string source in sources)
        {
            using (TextReader reader = File.OpenText(source))
            {
                int charsRead;
                while ((charsRead = reader.Read(buffer, 0, buffer.Length)) > 0)
                {
                    writer.Write(buffer, 0, charsRead);
                }
            }
        }
    }
}

This new answer is quite like Martin's approach, except:

It reads into a smaller buffer; 16K is going to be acceptable in almost all situations, and won't end up on the large object heap (which doesn't get compacted)
It reads text data instead of binary data, for two reasons:
- The code can easily be modified to convert from one encoding to another
- If each input file contains a byte-order mark, that will be skipped by the reader, instead of ending up with byte-order marks scattered through the output file at input file boundaries

Original answer

Martin Stettner pointed out an issue in the answer below - if the first file ends without a newline, it will still create a newline in the output file. Also, it will translate newlines into the "\r\n" even if they were previously just "\r" or "\n". Finally, it pointlessly risks using large amounts of data for long lines.

Something like:

static void CopyFiles(string dest, params string[] sources)
{
    using (TextWriter writer = File.CreateText(dest))
    {
        foreach (string source in sources)
        {
            using (TextReader reader = File.OpenText(source))
            {
                string line;
                while ((line = reader.ReadLine()) != null)
                {
                    writer.WriteLine(line);
                }
            }
        }
    }
}

Note that this reads line by line to avoid reading too much into memory at a time. You could make it simpler if you're happy to read each file completely into memory (still one at a time):

static void CopyFiles(string dest, params string[] sources)
{
    using (TextWriter writer = File.CreateText(dest))
    {
        foreach (string source in sources)
        {
            string text = File.ReadAllText(source);
            writer.Write(text);
        }
    }
}

Jon Skeet 2009-05-19 20:42:51

Skeet beats me again!

jrcs3 2009-05-19 20:44:17

He just took a look at his keyboard and it began typing the answer with the speed of light. :)

John 2009-05-19 20:45:22

Skeet must be an android. He can't possibly be human. +1

ichiban 2009-05-19 21:56:16

Please correct me if I'm wrong, but the first version will insert additional EOL characters if one file hasn't one at its end. So the two programs will not have the same behaviour imo.Also, you could theoretically run into troubles if you have really long lings (such that ReadLine isn't able to read them in the internal buffer). I think a version using an preallocated buffer and Stream.Read/Stream.Write might be more robust.

MartinStettner 2009-05-19 22:03:12

Ooh yes, you're right about the first one. I don't agree about using a stream directly though. Will post an updated version.

Jon Skeet 2009-05-19 22:27:08

Oh, and the ReadLine issue would only be a problem if a single line was enough to exceed not just an internal buffer, but your system memory. Still, it is a pointless limit.

Jon Skeet 2009-05-19 22:30:01

Sure, that's why I wrote "theoretically" :-))) On the other hand, if you work on really big files, you might also run into performance troubles because of the additional encoding work done by StreamReader. A version using FileStreams might help at least in these cases (I admit that the TextWriter-based solution will be better in most circumstances)

MartinStettner 2009-05-19 23:22:20

I'd expect the IO overhead to overwhelm the computational overhead - and with prefetch, hopefully those two *will* occur in parallel. But yes, it's certainly doing more work.

Jon Skeet 2009-05-20 05:20:26

Answer 2

+2 A:

Edit:

As Jon Skeet pointed out, text files usually should be handled differently than binary files .

I just leave this answer since it might be more performant if you have really big files and aren't concernded by encoding issues (such as different input files having different encodings or multiple Byte Order Marks in the output file):

public void CopyFiles(string destPath, string[] sourcePaths) {
  byte[] buffer = new byte[10 * 1024 * 1024]; // Just allocate a buffer as big as you can afford
  using (var destStream= = new FileStream(destPath, FileMode.Create) {
    foreach (var sourcePath in sourcePaths) {
      int read;
      using (var sourceStream = FileStream.Create(sourcePath, FileMode.Open) {
        while ((read = sourceStream.Read(buffer, 0, 10*1024*1024)) != 0)
          destStream.Write(buffer, 0, read);
      }
    }
  }
}

MartinStettner 2009-05-19 22:18:05

It does make a difference - consider what happens if all the text files start with a byte order mark. You would want your output to only have a single one.

Jon Skeet 2009-05-19 22:26:29

Thank you for pointing this out. Interestingly, MSDN doesn't mention that File.OpenText() (or even StreamReader) consumes the byte order marks. BOMs are not even mentioned in all StreamReader-constructors documentation. Moreover MSDN states that File.OpenText works with UTF-8 files while it really uses the same detection mechanism as StreamReader (thus perfectly working with any other supported encoding).

MartinStettner 2009-05-19 23:14:29

I'd take issue with "any other supported encoding" - it's "any auto-detected encoding" which is a very different thing IMO :)

Jon Skeet 2009-05-20 05:19:33

ansaurus

tags:

views:

answers:

How to read multiple text files and save them into one text file?

related questions