views:

357

answers:

6

We have a text file with about 100,000 rows, about 50 columns per row, most of the data is pretty small (5 to 10 characters or numbers).

This is a pretty simple task, but just wondering what the best way would be to import this data into a C# data structure (for example a DataTable)?

+7  A: 

I would read it in as a CSV with the tab column delimiters:

A Fast CSV Reader

Edit:
Here's a barebones example of what you'd need:

DataTable dt = new DataTable();
using (CachedCsvReader csv = new CachedCsvReader(new StreamReader(CSV_FULLNAME), false, '\t')) {
    dt.Load(csv);
}

Where CSV_FULLNAME is the full path + filename of your tab delimited CSV.

Jay Riggs
We use this CSV reader in our app, it's great.
Sam
I'll check it out, anyone happen to have sample code for using it with tab column delimiters?
alchemical
I'd suggest going with something like this. You can also use the Access/Jet driver for ADO.Net access, but there's sometimes a bit of quirkiness there.
Tracker1
@AirMan - I added a simple sample.
Jay Riggs
I implemented this and it works great, was then told corp policy prohibits opensource, so I hacked together an object that takes twice as long. Fast CSV Reader seems like a great way to go if you can--the developer even responded to a couple questions within a few minutes.
alchemical
A: 

Two options:

  1. Use the classes in the System.Data.OleDb namespace. This has the advantage of reading directly into a datatable like you asked with very little code, but it can be tricky to get right because it's tab rather than comma delimited.
  2. Use or write a csv parser. Make sure it's a state machine-based parser like the one @Jay Riggs linked to rather than a String.Split()-based parser. This should be faster than the OleDb method, but it will give you a List or array rather than a datatable.
Joel Coehoorn
Just curious why you suggest a state-based parser as opposed to String.Split().
Andy White
Performance, mainly. But also that it's hard to get things like quoted-text right with string.split.
Joel Coehoorn
Also, by "state machine" I _don't_ mean Regex, which is just about as bad. I mean a state machine purpose-built for this task that can better handle descent/recursion.
Joel Coehoorn
+1  A: 

What about FileHelpers, you can define the tab as a delimiter. HEad on over to that site by the link supplied and have a peeksy.

Hope this helps, Best regards, Tom.

tommieb75
A: 

However you parse the lines, make sure you use something that supports forwarding and rewinding, being the data source of your data grid. You don't want to load everything into memory first, do you? How about if the amount of data should be ten-fold the next time? Make something that uses file.seek deep down, don't read everything to memory first. That's my advice.

Erik A. Brandstadmoen
A: 

Simple, but not the necessarily a great way:

  • Read the file using a text reader into a string

  • Use String.Split to get the rows

  • use String.Split with a tab character to get field values

Andrew Lewis
just to remind about it - there is string[] System.IO.File.ReadAllLines(string path)
DK
+2  A: 

Use .NET's built in text parser. It is free, has great error handling, and deals with a lot of odd ball cases.

http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser(VS.80).aspx

Jonathan Allen
Is this VB only?
alchemical
Nope, it just happens to be in the VB namespace because that team built it. You can use it from any .NET language.
Jonathan Allen