views:

80

answers:

5

Which is the most performant way to read a large csv file in .NET? Using FileStream? or another class? Thanks!

A: 

You can use a stream but read the file line by line

vc 74
What if the file contains a single line?
Darin Dimitrov
what do you mean? if the file is missing carriage returns?
vc 74
No, a CSV file is comma-delimited. You could have a single line and many columns.
Darin Dimitrov
@Darin: I hope the OP doesn't mean he has a *large* **one-line** CSV file. (I say this not condescendingly; at my work we actually once stored settings files in exactly this format. The one line ended up being thousands of characters.)
Dan Tao
+1  A: 

If you want to read it all into memory, a simple File.ReadAllText() will do just fine.

EDIT: If your file is indeed very large, then you can use the StreamReader class, see here for details. This approach is sometimes inevitable but should mostly be avoided for style reasons. See here for a more in-depth discussion.

Manu
yes but if the file is large, it's probably better to read it line by line
vc 74
What is the "proper style" for reading large files?
Robert Harvey
+3  A: 

You can use the StreamReader returned by FileInfo.OpenText:

Dim file As New FileInfo("path\to\file")

Using reader As StreamReader = file.OpenText()
    While Not reader.EndOfStream
        Dim nextLine As String = reader.ReadLine()
        ProcessCsvLine(nextLine)
    End While
End Using
Dan Tao
+1  A: 

The most efficient way of doing this is by taking advantage of deffered execution in LINQ. You can create a simple Linq-To-Text function that read one line at a time, work on it and then continue. This is really helpful since the file is really large.

I would desist from using the ReadBlock or ReadBlock or ReadToEnd methods of StreamReader class since they tend to read a number of lines at once or even the entire lines in the file. This ends up consuming more memory than if a line was read one at a time.

public static IEnumerable<string> Lines(this StreamReader source)
{
    String line;

    if (source == null)
        throw new ArgumentNullException("source");

    while ((line = source.ReadLine()) != null)
    {
        yield return line;
    }
}

Note that the function is an extension method of the StreamReader class. This means it can be used as follows:

class Program
{
   static void Main(string[] args)
   {
       using(StreamReader streamReader = new StreamReader("TextFile.txt"))
       {
            var tokens = from line in streamReader.Lines()
            let items = line.Split(',')               
            select String.Format("{0}{1}{2}",
                items[1].PadRight(16),
                items[2].PadRight(16),
                items[3].PadRight(16));

       }
   }
}
Waliaula Makokha
This seems like a lot of ceremony when you could simply use `While (line = streamReader.ReadLine() != null)` in your second code block.
Robert Harvey
A new way of doing things is not a ceremony. Linq is not a branch that monkeys swing on. Get on the train dude! Then what about the logic for tokenizing the line by the commas. Will you use a for loop too.
Waliaula Makokha
+1  A: 

I had very good experience with this library:

http://www.codeproject.com/KB/database/CsvReader.aspx

Adesit