views:

335

answers:

7

Is there any performance comparison of System.IO.File.ReadAllxxx / WriteAllxxx methods vs StreamReader / StremWriter classes available on web. What you think is the best way(from a performance perspective) to read/write text files in .net 3.0?

When I checked the MSDN page of System.IO.File class, in the sample code MS is using StreamReader / StreamWriter for file operations. Is there any specific reason for avoiding File.ReadAllxxx / WriteAllxxx methods, even though they look much easier to understand?

+3  A: 

The File.ReadAllText and similar methods use StreamReader/Writers internally, so performance should be comparable to whatever you do yourself.

I'd say go with the File.XXX methods whenever possible, it makes your code a) easier to read b) less likely to contain bugs (in any impl you write yourself).

Fredrik Kalseth
Thank you very much for you answer. I was also thinking in the same line. But got confused when I saw the MSDN page I mentioned in my question.
Vijesh VP
A: 

@Fredrik Kalseth is right. File.ReadXXX methods are just convenient wrappers around StreamReader class.

For example here is an implementation of File.ReadAllText

public static string ReadAllText(string path, Encoding encoding)
{
    using (StreamReader reader = new StreamReader(path, encoding))
    {
        return reader.ReadToEnd();
    }
}
aku
+5  A: 

You probably don't want to use File.ReadAllxxx / WriteAllxxx if you have any intention to support loading / saving of really large files.

In other words, for an editor which you intend to remain usable when editing gigabyte size files, you want some design with StreamReader/StreamWriter and seeking, so you load only the part of the file that is visible.

For anything without these (rare) requirements, I'd say take the easy route and use File.ReadAllxxx / WriteAllxxx. They just use the same StreamReader/Writer pattern internally as you'd code by hand anyway, as aku shows.

Tobi
+1  A: 

Unless you are doing something such as applying a regular expression that is multiline matching to a text file you generally want to avoid the ReadAll/WriteAll. Doing things in smaller more manageable chunks will almost always result in better performance.

For example, reading a table from a database and sending it to a client's web browser should be done in small sets that utilize the nature of small network messages and reduce the usage of the processing computer's memory. There's no reason to buffer 10,000 records in memory on the web server and dump it all at once. Same thing goes for file systems. If you are concerned with write performance of many small amounts of data - such as what goes on in the underlying file system for allocating space and what's the overhead - you may find these articles enlightening:

Windows File Cache Usage

File Read Benchmarks

Clarification: if you are doing a ReadAll followed by a String.Split('\r') to get an array of all the lines in the file, and the using a for loop to process each line that's code which will generally result in worse performance than reading the file line by line and performing your process on each line. This isn't a hard rule - if you have some processing that takes a large chunk of time its often better to release system resources (the file handle) sooner than later. However in regards to writing files its almost always better to dump the results of any transformative process (such as invoking ToString() on a large list of items) per item than buffering it in memory.

cfeduke
A: 

The others have explained the performance so I won't add to it, however I will add that it is likely that the MSDN code sample was written before .NET 2.0 when the helper methods were not available.

Richard Szalay
@Richard I was also thinking that. I just wanted to confirm I'm not overlooking anything here. thanks for your answer.
Vijesh VP
A: 

This link has benchmarks for reading 50+K Lines, and indicates that a streamreader is about 40% faster.

http://dotnetperls.com/Content/File-Handling.aspx

torial
+1  A: 

This MSR (Microsoft Research) paper is a good start, they also document a number of point tools like, IOSpeed, FragDisk, etc... which you can use and test in your envrionment.

There is also an updated report/presentation you can read about how to maximise sequential IO. Very interesting stuff as they debunk, the "moving the HD head is the most time consuming operation" myth, they also document fully their test envrionments and associated configurations, down to the motherboard, raid controller and virtually any relivent information for you to replicate their work. Some of the highlights are how an Opteron / XEON matched up, but they then also compared them to an insane\hype NEC Itanium (32 or 64 proc or something) for measure. From the second link here you can find a lot more resources around how to test and evaluate high-throughput scenerio's and needs.

Some of the other MSR paper's in this same research topic involve guidieance about where to maximise your spending, (e.g. RAM, CPU, Disk Spindals... etc..) to accomidate your usage patterns... all very neat.

However some of it is dated, but usually older-API's are the faster/low-level ones anyhow ;)

I currently push hundreds of thousands of TPS on a purpose built app server, using a mix of C#, C++/CLI, native code and bitmap Caching (rtl*bitmap).

Take care;

RandomNickName42