tags:

views:

472

answers:

3

Hi everyone,

Having a problem with parsing a CSV file. I connect to the file using the following:

string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
          + "Data Source=\"" + dir + "\\\";"
          + "Extended Properties=\"text;HDR=No;FMT=Delimited\"";
        //create the database query
        string query = "SELECT * FROM [" + file + "]";

        //create a DataTable to hold the query results
        DataTable dTable = new DataTable();


        //create an OleDbDataAdapter to execute the query
        OleDbDataAdapter dAdapter = new OleDbDataAdapter(query, connString);

        //Get the CSV file to change position.

        //fill the DataTable
        dAdapter.Fill(dTable);

        return dTable;

For some reason, the first column reads as a "Header" ok (i.e. HDR=Yes allows the values to be displayed). The problem is when I have HDR=No, nothing after the first 'cell' is displayed in that row. However I need to have HDR=No as I'll be writing the CSV later.

As a quick aside, the rest of the row only has a value in every other column. Also, there is a period in each of these columns. Any help?

Cheers.

EDIT: Here are a fake few lines similar to the CSV:

//Problem row->>    
File:,GSK1.D,,GSK2.D,,GSK3.D,
//The following rows, however, are fine:
 / 69,120.3,16.37%,128.9,7.16%,188.92,13.97%
D / 71,48.57,75.50%,32.15,26.65%,58.35,71.43%
T / 89,35.87,45.84%,50.01,28.87%,15.38,43.30%

EDIT: When I put any value into the "blank spaces" above they are parsed, but no matter what I put into the problematic cells (e.g. GSK1.D) they won't parse - unless it is a number! Is there any chance it is automatically converting this cell to a "float" cell? And how can I stop it doing this?

+1  A: 

Hi David,

at Codeproject there is an parsing library: http://www.codeproject.com/KB/database/CsvReader.aspx

with an interesting article, how this stuff work. Its working faster (Author), than the OleDB Provider.

Andreas Hoffmann
Thanks for the link, but as mentioned in the comments in the previous answer I am most of the way there and would not like to rebuild now if possible.
David Archer
You can create a DataTable easy from an Reader and i'm thinking therse also an table adapter you can fill DataSets.
Andreas Hoffmann
A: 

I've seldom done well with database type access to text files - the possibilities for "issues" with the file tend to exceed theoretical time savings.

Personally I've more often than not hand crafted the code to do this. A lot (going back over 20+ years so generic solutions have been thin on the ground). That said, if I were having to process a .csv file now the first thing I'd reach for would be FileHelpers or similar.

Murph
Murph, have you (or anyone else here) used FileHelpers? Would you recommend it for processing large (~10-50GB) delimited files that may possibly contain malformed records (e.g. quote mismatches, incorrect number of fields per record, etc)?
Robert
I have but not for that sort of file size, most of the processing I've done is of much smaller files and the malformed stuff I dealt with was in code that predates (my finding of file helpers) so I had my own code to deal with.
Murph
+1  A: 

I have finished this, just to let anyone know who may have this problem in the future. It turns out the reason there was nothing being taken in was because ADO tries to determine a column type. If other values in this column are not of said type, it removes them completely.

To counter this, you need to create a schema.ini file, like so:

StreamWriter writer = new StreamWriter(File.Create(dir + "\\schema.ini"));
writer.WriteLine("[" + fileToBeRead + "]");
writer.WriteLine("ColNameHeader = False");
writer.WriteLine("Format = CSVDelimited");
writer.WriteLine("CharacterSet=ANSI");

int iColCount = dTable.Columns.Count + 1;
for (int i = 1; i < iColCount; i++)
{
    writer.WriteLine("Col" + i + "=Col" + i + "Name Char Width 20");
}


//writer.WriteLine("Col1=Col1Name Char Width 20");
//writer.WriteLine("Col2=Col1Name Char Width 20");
//etc.

writer.Close();

Thanks for everyone's suggestions!

David Archer
I just found your solution after coming to the same solution on my own project just now. +1 for wishing I had this an hour ago.
mrduclaw