views:

5369

answers:

12

I realize this is a newbie question, but I'm looking for a simple solution - it seems like there should be one.

What's the best way to import a CSV file into a strongly-typed data structure? Again simple = better.

Thanks.

+25  A: 

Check out FileHelpers.

NotMyself
Unfortunately this is LGPL, which is less than ideal in a corporate environment...
John Weldon
@John, why do you say that? LGPL doesn't require you to release any code unless you modify the library itself. (In which case it would make sense to submit a patch anyway.)
dangph
+1 Just implemented this...awesome
Miyagi Coder
@dangph I don't think that's quite true. http://www.opensource.org/licenses/lgpl-2.1.php states "However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library ... The executable is therefore covered by this License. Section 6 states terms for distribution of such executables. "
Zeus
@Zeus, I still don't think you have to release the source of your "work that uses the Library". You need to release "object code and/or source code". I'm not sure what that means in a .Net environment. But you are right. The requirements of section 6 are extremely onerous. What a ridiculous license.
dangph
@dangph I agree, I'm not sure quite what to make of it!
Zeus
FileHelpers homepage says: "FileHelpers Library is @Copyright 2005-2006 to Marcos Meli but it's source code and the binaries are free for commercial and non commercial use."
Carl Hörberg
Another problem with FileHelpers is that development on it seems to have completely stalled since about 2007. And unfortunately it contains bugs. (Probably it would work fine for simple cases.) Even though it's open source, it's not clear that the author is accepting patches.
dangph
+1  A: 

A good simple way to do it is to open the file, and read each line into an array, linked list, data-structure-of-your-choice. Be careful about handling the first line though.

This may be over your head, but there seems to be a direct way to access them as well using a connection string.

Why not try using python instead of c# or vb? It has a nice CSV module to import that does all the heavy lifting for you.

EDIT: @NotMyself - just because we commented poorly on your trolling question, please don't vote down our answers. sasb and i appreciate it.

contagious
Don't jump to python from VB for the sake of a CSV parser. There's one in VB. Although weirdly it seems to have been ignored in the answers to this question. http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx
MarkJ
A: 

If the file is small:

Read each line, tokenize what you've read and assign each value to its respective place in the data structure

saniul
A: 

If you can guarantee that there are no commas in the data, then the simplest way would probably be to use String.split.

For example:

String[] values = myString.Split(',');
myObject.StringField = values[0];
myObject.IntField = Int32.Parse(values[1]);

There may be libraries you could use to help, but that's probably as simple as you can get. Just make sure you can't have commas in the data, otherwise you will need to parse it better.

Mike Stone
this is not an optimal solution
Miau
+2  A: 

Two articles on CodeProject that provide code for a solution, one that uses SteamReader and one that imports CSV data using the Microsoft Text Driver.

Yaakov Ellis
+3  A: 

I was bored so i modified some stuff i wrote. It try's to encapsulate the parsing in an OO manner whle cutting down on the amount of iterations through the file, it only iterates once at the top foreach.

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.IO;

namespace ConsoleApplication1 {

class Program
{

static void Main(string[] args)
{

// usage:

// note this wont run as getting streams is not Implemented

// but will get you started

CSVFileParser fileParser = new CSVFileParser();

// TO Do: configure fileparser

PersonParser personParser = new PersonParser(fileParser);

List<Person> persons = new List<Person>();
// if the file is large and there is a good way to limit
// without having to reparse the whole file you can use a
// linq query if you desire
foreach (Person person in personParser.GetPersons())
{
persons.Add(person);
}

// now we have a list of Person objects
}
}

public abstract class CSVParser
{

protected String[] deliniators = { "," };

protected internal IEnumerable<String[]> GetRecords()
{

Stream stream = GetStream();
StreamReader reader = new StreamReader(stream);

String[] aRecord;
while (!reader.EndOfStream)
{
aRecord = reader.ReadLine().Split(deliniators,
StringSplitOptions.None);

yield return aRecord;
}

}

protected abstract Stream GetStream();

}

public class CSVFileParser : CSVParser
{
// to do: add logic to get a stream from a file

protected override Stream GetStream()
{
throw new NotImplementedException();
}
}

public class CSVWebParser : CSVParser
{
// to do: add logic to get a stream from a web request

protected override Stream GetStream()
{
throw new NotImplementedException();
}
}

public class Person
{
public String Name { get; set; }
public String Address { get; set; }
public DateTime DOB { get; set; }
}

public class PersonParser
{

public PersonParser(CSVParser parser)
{
this.Parser = parser;
}

public CSVParser Parser { get; set; }

public IEnumerable<Person> GetPersons()
{
foreach (String[] record in this.Parser.GetRecords())
{
yield return new Person()
{
Name = record[0],
Address = record[1],
DOB = DateTime.Parse(record[2]),
};
}
}
}

}

Brian Leahy
+3  A: 

Brian gives a nice solution for converting it to a strongly typed collection.

Most of the CSV parsing methods given don't take into account escaping fields or some of the other subtleties of CSV files (like trimming fields). Here is the code I personally use. It's a bit rough around the edges and has pretty much no error reporting.

public static IList<IList<string>> Parse(string content)
{
IList<IList<string>> records = new List<IList<string>>();

StringReader stringReader = new StringReader(content);

bool inQoutedString = false;
IList<string> record = new List<string>();
StringBuilder fieldBuilder = new StringBuilder();
while (stringReader.Peek() != -1)
{
char readChar = (char)stringReader.Read();

if (readChar == '\n' || (readChar == '\r' && stringReader.Peek() == '\n'))
{
// If it's a \r\n combo consume the \n part and throw it away.
if (readChar == '\r')
{
stringReader.Read();
}

if (inQoutedString)
{
if (readChar == '\r')
{
fieldBuilder.Append('\r');
}
fieldBuilder.Append('\n');
}
else
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();

records.Add(record);
record = new List<string>();

inQoutedString = false;
}
}
else if (fieldBuilder.Length == 0 && !inQoutedString)
{
if (char.IsWhiteSpace(readChar))
{
// Ignore leading whitespace
}
else if (readChar == '"')
{
inQoutedString = true;
}
else if (readChar == ',')
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();
}
else
{
fieldBuilder.Append(readChar);
}
}
else if (readChar == ',')
{
if (inQoutedString)
{
fieldBuilder.Append(',');
}
else
{
record.Add(fieldBuilder.ToString().TrimEnd());
fieldBuilder = new StringBuilder();
}
}
else if (readChar == '"')
{
if (inQoutedString)
{
if (stringReader.Peek() == '"')
{
stringReader.Read();
fieldBuilder.Append('"');
}
else
{
inQoutedString = false;
}
}
else
{
fieldBuilder.Append(readChar);
}
}
else
{
fieldBuilder.Append(readChar);
}
}
record.Add(fieldBuilder.ToString().TrimEnd());
records.Add(record);

return records;
}

Note that this doesn't handle the edge case of fields not being deliminated by double quotes, but meerley having a quoted string inside of it. See this post for a bit of a better expanation as well as some links to some proper libraries.

ICR
A: 

I had to use a CSV parser in .NET for a project this summer and settled on the Microsoft Jet Text Driver. You specify a folder using a connection string, then query a file using a SQL Select statement. You can specify strong types using a schema.ini file. I didn't do this at first, but then I was getting bad results where the type of the data wasn't immediately apparent, such as IP numbers or an entry like "XYQ 3.9 SP1".

One limitation I ran into is that it cannot handle column names above 64 characters; it truncates. This shouldn't be a problem, except I was dealing with very poorly designed input data. It returns an ADO.NET DataSet.

This was the best solution I found. I would be wary of rolling my own CSV parser, since I would probably miss some of the end cases, and I didn't find any other free CSV parsing packages for .NET out there.

EDIT: Also, there can only be one schema.ini file per directory, so I dynamically appended to it to strongly type the needed columns. It will only strongly-type the columns specified, and infer for any unspecified field. I really appreciated this, as I was dealing with importing a fluid 70+ column CSV and didn't want to specify each column, only the misbehaving ones.

pbh101
Why not the VB.NET built in CSV parser? http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx
MarkJ
+7  A: 

If you're expecting fairly complex scenarios for CSV parsing, don't even think up of rolling our own parser. There are a lot of excellent tools out there, like FileHelpers, or even ones from CodeProject.

The point is this is a fairly common problem and you could bet that a lot of software developers have already thought about and solved this problem.

Jon Limjap
+3  A: 

I agree with @NotMyself. FileHelpers is well tested and handles all kinds of edge cases that you'll eventually have to deal with if you do it yourself. Take a look at what FileHelpers does and only write your own if you're absolutely sure that either (1) you will never need to handle the edge cases FileHelpers does, or (2) you love writing this kind of stuff and are going to be overjoyed when you have to parse stuff like this:

1,"Bill","Smith","Supervisor", "No Comment"

2 , 'Drake,' , 'O'Malley',"Janitor,

Oops, I'm not quoted and I'm on a new line!

Jon Galloway
+3  A: 

Use an OleDB connection.


String sConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\\InputDirectory\\;Extended Properties='text;HDR=Yes;FMT=Delimited'";
OleDbConnection objConn = new OleDbConnection(sConnectionString);
objConn.Open();
DataTable dt = new DataTable();
OleDbCommand objCmdSelect = new OleDbCommand("SELECT * FROM file.csv", objConn);
OleDbDataAdapter objAdapter1 = new OleDbDataAdapter();
objAdapter1.SelectCommand = objCmdSelect;
objAdapter1.Fill(dt);
objConn.Close();
Kevin
+4  A: 

You could use the TextFieldParser Microsoft supply in the .NET framework?

If you're using VB.NET it seems like an obvious choice (the question is tagged both VB.NET and C#). You could use it from C#, even though it's part of the VB.NET namespace it's still installed with the .NET framework.

MarkJ