views:

1028

answers:

4

Does anyone know of an open-source library that allows you to parse and read .csv files in C#?

+12  A: 

Take a look at A Fast CSV Reader on CodeProject.

Galwegian
Yeah, that one is great, aka LumenWorks.Framework.IO.Csv by Sebastien Lorien
codeulike
+1 I've used this with good results
Gabe Moothart
+1 Currently using it and works perfectly.
Eddie
+1 We've used it too.
TrueWill
Thanks, I used this Library ;)
diegocaro
+8  A: 

Here, written by yours truly to use generics collections and iterator blocks. It supports double-quote enclosed text fields (including ones that span mulitple lines) using the double-escaped convention (so "" inside a quoted field reads as single quote character). It does not support:

  • Single-quote enclosed text
  • \ -escaped quoted text
  • alternate delimiters (won't yet work on pipe or tab delimited fields)

But all of those would be easy enough to add if you need them. I haven't benchmarked it anywhere (I'd love to see some results), but performance should be very good - better than anything that's .Split() based anyway.

Update: felt like adding single-quote enclosed text support. It's a simple change, but I typed it right into the reply window so it's untested. Use the revision link at the bottom if you'd prefer the old (tested) code.

public static class CSV
{
    public static IEnumerable<IList<string>> FromFile(string fileName)
    {
        foreach (IList<string> item in FromFile(fileName, ignoreFirstLineDefault)) yield return item;
    }

    public static IEnumerable<IList<string>> FromFile(string fileName, bool ignoreFirstLine)
    {
        using (StreamReader rdr = new StreamReader(fileName))
        {
            foreach(IList<string> item in FromReader(rdr, ignoreFirstLine)) yield return item;
        }
    }

    public static IEnumerable<IList<string>> FromStream(Stream csv)
    {
        foreach (IList<string> item in FromStream(csv, ignoreFirstLineDefault)) yield return item;
    }

    public static IEnumerable<IList<string>> FromStream(Stream csv, bool ignoreFirstLine)
    {
        using (StreamReader rdr = new StreamReader(csv))
        {
            foreach (IList<string> item in FromReader(rdr, ignoreFirstLine)) yield return item;
        }
    }

    public static IEnumerable<IList<string>> FromReader(TextReader csv)
    {
        //Probably should have used TextReader instead of StreamReader
        foreach (IList<string> item in FromReader(csv, ignoreFirstLineDefault)) yield return item;
    }

    public static IEnumerable<IList<string>> FromReader(TextReader csv, bool ignoreFirstLine)
    {
        if (ignoreFirstLine) csv.ReadLine();

        IList<string> result = new List<string>();

        StringBuilder curValue = new StringBuilder();
        char c;
        c = (char)csv.Read();
        while ((int)c > 0 && !csv.EndOfStream)
        {
            switch (c)
            {
                case ',': //empty field
                    result.Add("");
                    c = (char)csv.Read();
                    break;
                case '"': //qualified text
                case '\'':
                    char q = c;
                    c = (char)csv.Read();
                    bool inQuotes = true;
                    while ((int)c > 0 && inQuotes && !csv.EndOfStream)
                    {
                        if (c == q)
                        {
                            c = (char)csv.Read();
                            if (c != q)
                                inQuotes = false;
                        }

                        if (inQuotes)
                        {
                            curValue.Append(c);
                            c = (char)csv.Read();
                        } 
                    }
                    result.Add(curValue.ToString());
                    curValue = new StringBuilder();
                    if (c == ',') c = (char)csv.Read(); // either ',', newline, or endofstream
                    break;
                case '\n': //end of the record
                case '\r':
                    //potential bug here depending on what your line breaks look like
                    if (result.Count > 0) // don't return empty records
                    {
                        yield return result;
                        result = new List<string>();
                    }
                    c = (char)csv.Read();
                    break;
                default: //normal unqualified text
                    while ((int)c > 0 && c != ',' && c != '\r' && c != '\n' && !csv.EndOfStream )
                    {
                        curValue.Append(c);
                        c = (char)csv.Read();
                    }
                    result.Add(curValue.ToString());
                    curValue = new StringBuilder();
                    if (c == ',') c = (char)csv.Read(); //either ',', newline, or endofstream
                    break;
            }

        }
        if (curValue.Length > 0) //potential bug: I don't want to skip on a empty column in the last record if a caller really expects it to be there
            result.Add(curValue.ToString());
        if (result.Count > 0) 
            yield return result;

    }
    private static bool ignoreFirstLineDefault = false;
}
Joel Coehoorn
Can it handle commas inside quoted string? "like,this" ... and can it handle carriage returns in quoted strings? ... those are some of the things that tend to cause trouble...
codeulike
Yes, it can handle both. That's the whole point of quoted strings.
Joel Coehoorn
I still like this, but if I had it to do over I'd probably inherit TextReader
Joel Coehoorn
+3  A: 

I really like the FileHelpers library. It's fast, it's C# 100%, it's available for FREE, it's very flexible and easy to use.

Check it out - well worth a good look!

Marc

marc_s
The FileHelpers Wizard looks really useful at creating standard classes quickly.
John M
+2  A: 

The last time this question was asked, here's the answer I gave:

If you're just trying to read a CSV file with C#, the easiest thing is to use the Microsoft.VisualBasic.FileIO.TextFieldParser class. It's actually built into the .NET Framework, instead of being a third-party extension.

Yes, it is in Microsoft.VisualBasic.dll, but that doesn't mean you can't use it from C# (or any other CLR language).

Here's an example of usage, taken from the MSDN documentation:

Using MyReader As New _
Microsoft.VisualBasic.FileIO.TextFieldParser("C:\testfile.txt")
   MyReader.TextFieldType = FileIO.FieldType.Delimited
   MyReader.SetDelimiters(",")
   Dim currentRow As String()
   While Not MyReader.EndOfData
      Try
         currentRow = MyReader.ReadFields()
         Dim currentField As String
         For Each currentField In currentRow
            MsgBox(currentField)
         Next
      Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
      MsgBox("Line " & ex.Message & _
      "is not valid and will be skipped.")
      End Try
   End While
End Using

Again, this example is in VB.NET, but it would be trivial to translate it to C#.

Daniel Pryden