ansaurus

Question

Reading large text files with streams in C#

Answer 1

+1 A:

use a background worker and read only a limited number of lines, read more only when the user scrolls

and try to never use ReadToEnd(), it's one of the functions that you think "why did they make it?", its a Script-Kidies-Helper that goes fine with small things, but as you see, it sux for large files...

EDIT
those guys telling you to use StringBuilder need to read the MSDN more often:

Performance Considerations
The Concat and AppendFormat methods both concatenate new data to an existing String or StringBuilder object. A String object concatenation operation always creates a new object from the existing string and the new data. A StringBuilder object maintains a buffer to accommodate the concatenation of new data. New data is appended to the end of the buffer if room is available; otherwise, a new, larger buffer is allocated, data from the original buffer is copied to the new buffer, then the new data is appended to the new buffer. The performance of a concatenation operation for a String or StringBuilder object depends on how often a memory allocation occurs.
A String concatenation operation always allocates memory, whereas a StringBuilder concatenation operation only allocates memory if the StringBuilder object buffer is too small to accommodate the new data. Consequently, the String class is preferable for a concatenation operation if a fixed number of String objects are concatenated. In that case, the individual concatenation operations might even be combined into a single operation by the compiler. A StringBuilder object is preferable for a concatenation operation if an arbitrary number of strings are concatenated; for example, if a loop concatenates a random number of strings of user input.

That means huge allocation of memory, what becomes large use of swap files system, that simulates sections of your HDD to act like the RAM memory, but HDD is very slow. The StringBuilder option looks fine for who use the system as a mono-user, but when you have 2 or more users reading large files at the same time, you have a problem.

Tufo 2010-01-29 12:42:11

far out you guys are super quick! unfortunately because of the way the macro's work the entire stream needs to be loaded. As I mentioned don't worry about the richtext part. Its the initial loading we're wanting to improve.

Nicole Lee 2010-01-29 12:45:55

so you can work in parts, read first X lines, apply the macro, read the second X lines, apply the macro, and so on...if you explain what this macro do, we can help you with more precision

Tufo 2010-01-29 12:51:10

Answer 2

A:

You might be better off to use memory-mapped files handling here.. The memory mapped file support will be around in .NET 4 (I think...I heard that through someone else talking about it), hence this wrapper which uses p/invokes to do the same job..

Edit: See here on the MSDN for how it works, here's the blog entry indicating how it is done in the upcoming .NET 4 when it comes out as release. The link I have given earlier on is a wrapper around the pinvoke to achieve this. You can map the entire file into memory, and view it like a sliding window when scrolling through the file.

Hope this helps, Best regards, Tom.

tommieb75 2010-01-29 12:52:03

Answer 3

+1 A:

Have a look at the following code snippet. You have mentioned Most files will be 30-40mb, this claims to read 180mb in 1.4 seconds on an Intel Quad Core:

private int _bufferSize = 16384; 

private void ReadFile(string filename) 
{
    StringBuilder stringBuilder = new StringBuilder();     
    FileStream fileStream = new FileStream(filename, FileMode.Open, FileAccess.Read);  

    using (StreamReader streamReader = new StreamReader(fileStream))     
    {        
        char[] fileContents = new char[_bufferSize];         
        int charsRead = streamReader.Read(fileContents, 0, _bufferSize); 

        // Can't do much with 0 bytes        
        if (charsRead == 0)             
            throw new Exception("File is 0 bytes"); 

        while (charsRead > 0)         
        {             
            stringBuilder.Append(fileContents);             
            charsRead = streamReader.Read(fileContents, 0, _bufferSize); 
        }     
    } 
}

Original Article

James 2010-01-29 12:52:50

@James: to quote from original article 'This example reads a 180mb text file in 1.4 seconds on an Intel Quad Core'....think you should edit your answer to put that in...the mileage will vary....

tommieb75 2010-01-29 13:01:19

@tommie, ah good spot will do.

James 2010-01-29 13:11:52

These kind of tests are notoriously unreliable. You'll read data from the file system cache when you repeat the test. That's at least one order of magnitude faster than a real test that reads the data off the disk. A 180 MB file cannot possibly take less than 3 seconds. Reboot your machine, run the test once for the real number.

Hans Passant 2010-01-29 14:19:18

Answer 4

+2 A:

This should be enough to get you started.

class Program
{        
    static void Main(String[] args)
    {
        const int bufferSize = 1024;

        var sb = new StringBuilder();
        var buffer = new Char[bufferSize];
        var length = 0L;
        var totalRead = 0L;
        var count = bufferSize; 

        using (var sr = new StreamReader(@"C:\Temp\file.txt"))
        {
            length = sr.BaseStream.Length;               
            while (count > 0)
            {                    
                count = sr.Read(buffer, 0, bufferSize);
                sb.Append(buffer, 0, count);
                totalRead += count;
            }                
        }

        Console.ReadKey();
    }
}

ChaosPandion 2010-01-29 12:56:33

I would move the "var buffer = new char[1024]" out of the loop: it's not necessary to create a new buffer each time. Just put it before "while (count > 0)".

Tommy Carlier 2010-01-29 13:09:21

Good point, gotta keep the GC happy.

ChaosPandion 2010-01-29 13:10:32

Answer 5

+4 A:

You say you have been asked to show a progress bar while a large file is loading. Is that because the users genuinely want to see the exact % of file loading, or just because they want visual feedback that something is happening?

If the latter is true, then the solution becomes much simpler. Just do reader.ReadToEnd() on a background thread, and display a marquee-type progress bar instead of a proper one.

I raise this point because in my experience this is often the case. When you are writing a data processing program, then users will definitely be interested in a % complete figure, but for simple-but-slow UI updates, they are more likely to just want to know that the computer hasn't crashed. :-)

Christian Hayter 2010-01-29 13:03:51

Sagely Advice...

ChaosPandion 2010-01-29 13:05:45

But can the user cancel out of the ReadToEnd call?

Tim 2010-01-29 13:08:12

@Tim, well spotted. In that case, we're back to the `StreamReader` loop. However, it will still be simpler because there's no need to read ahead to calculate the progress indicator.

Christian Hayter 2010-01-29 13:46:54

Answer 6

A:

I know I am a little late on this one, but a iterator might be perfect for this type of work:

public static IEnumerable<int> LoadFileWithProgress(string filename, StringBuilder stringData)
{
    const int charBufferSize = 4096;
    using (FileStream fs = File.OpenRead(filename))
    {
        using (BinaryReader br = new BinaryReader(fs))
        {
            long length = fs.Length;
            int numberOfChunks = Convert.ToInt32((length / charBufferSize)) + 1;
            double iter = 100 / Convert.ToDouble(numberOfChunks);
            double currentIter = 0;
            yield return Convert.ToInt32(currentIter);
            while (true)
            {
                char[] buffer = br.ReadChars(charBufferSize);
                if (buffer.Length == 0) break;
                stringData.Append(buffer);
                currentIter += iter;
                yield return Convert.ToInt32(currentIter);
            }
        }
    }
}

You can all it using the following:

string filename = "C:\\myfile.txt";
StringBuilder sb = new StringBuilder();
foreach (int progress in LoadFileWithProgress(filename, sb))
{
    // Update your progress counter here!
}
string fileData = sb.ToString();

As the file is loaded, the iterator will return the progress number from 0 to 100, which you can use to update your progress bar. Once the loop has finished, the StringBuilder will contain the contents of the text file.

Also, because you want text, we can just BinaryReader to read in characters, which will ensure that your buffers line up correctly when reading any multi-byte characters (UTF-8, UTF-16, etc).

This is all done without using background tasks, threads, or complex custom state machines.

Extremeswank 2010-07-09 18:35:03

ansaurus

tags:

views:

answers:

Reading large text files with streams in C#

related questions