tags:

views:

419

answers:

1

I have some Excel file reading code that uses the OLEDB (Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=Excel 8.0;) which works well but I keep encountering an issue whereby certain dates are returned as DBNull.

In the original XLS document, the format of dates that work (en-GB locale) are:

"02/04/2009  17:00:00" // returned as a System.DateTime

And the following style fails:

"08/Jan/09 11:24 AM" // returned as DBNull

Excel knows they're both dates (although I can't force them to style correctly) as the following correctly shows a date:

=DATE(YEAR(c),MONTH(c),DAY(c))  // where c = cell reference.

Is there a way, without altering the auto-generated original, to get the data?


EDIT for reference, here is my read-data method (assuming a dbAdapter is set up already -- note the DBNull doesn't come from the catch which isn't fired at all):

    public List<List<string>> GetData(string tableName, int maxColumns)
    {
        List<List<string>> rows = new List<List<string>>();

        DataSet ExcelDataSet = new DataSet();
        dbCommand.CommandText = @"SELECT * FROM [" + tableName + "]";
        dbAdapter.Fill(ExcelDataSet);

        DataTable table = ExcelDataSet.Tables[0];

        foreach (DataRow row in table.Rows)
        {
            List<string> data = new List<string>();

            for (int column = 0; column < maxColumns; column++)
            {
                try
                {
                    data.Add(row[column].ToString());
                }
                catch (Exception)
                {
                    data.Add(null);
                }
            }

            //  Stop processing at first blank row
            if ( string.IsNullOrEmpty(data[0]) ) break;
            rows.Add(data);
        }

        return rows;
    }
+2  A: 

I don't know if this will be helpful or not, but I have run into issues with Excel OLEDB code returning NULLs where I expected data and it almost always came back to a data type inference issue. Excel determines the datatype of a column based on the first x rows of data (I think x=10, could be wrong). I know you don't want to alter the file, but it might be worth trying to put the problem date style in the first 10 rows and see if it alters the behavior of your application.

Obviously if it does fix it, then that doesn't solve your problem. The only fixes in that case that I know of are to alter the file (put something in the first 10 rows that forces it to use the correct datatype). Sorry I can't offer a better solution, but hopefully at least I am helping you figure out what's causing your issue.

Chris Shaffer
Indeed it makes a difference, as it happened the data prior to 23rd Jan used one format and the data afterwards another. Changing the order of the data still caused an error at the point the data switched over (but with the switch in the other direction). I may need to use CSV instead...
Ray Hayes
If altering the original file IS an option, you could put an obvious string value in the column (near the top), thus forcing the column to parse as string. Then convert to DateTime yourself in the code. Not the best option, but it works...
Chris Shaffer
No, not an option. I'm trying to create a reporting system to be used by non-technical people and they'll get it wrong. I switched to parsing as Csv (automating Excel for the conversion) and can, as you say, parse the text directly.
Ray Hayes