tags:

views:

4575

answers:

6

I've been using an OleDb connection to read excel files successfully for quite a while now, but I've run across a problem. I've got someone who is trying to upload an Excel spreadsheet with nothing in the first column and when I try to read the file, it doesn't recognize that column.

I'm currently using the following OleDb connection string:

Provider=Microsoft.Jet.OLEDB.4.0;
Data Source=c:\test.xls;
Extended Properties="Excel 8.0;IMEX=1;"

So, if there are 13 columns in the excel file, the OleDbDataReader I get back only has 12 columns/fields.

Any insight would be appreciated.

+1  A: 

We always use Excel Interop to open the spreadsheet and parse directly (e.g. similar to how you would scan through cells in VBA), or we create locked down templates that enforce certain columns to be filled in before the user can save the data.

Jess
Go with the interop library. LL is right.
KevDog
A: 

If could require the format of the excel sheet to have column headers, then you would always have the 13 columns. You would just need to skip the header row when processing.

This would also correct situations where the user puts the columns in an order that you are not expecting. (detect column indexes in the header row and read appropriately)

I see that others are recommending the Excel interop, but jeez that's a slow option compared to the OleDb way. Plus it requires Excel or OWC to be installed on the server (licensing).

StingyJack
The files currently have a header row. Even when I tell the OleDb to include the header row (using HDR=NO), it still only returns 12 columns and skips the first column.
Austin
That HDR option sounds backwards... check (http://www.connectionstrings.com/excel) as a reference for the conn string.
StingyJack
I know it sounds backwards, but you set HDR=NO to tell it to give you the header row (basically, you are saying the header row is a data row)
Austin
Ah... OK. I get it. Did you try also setting IMEX off?
StingyJack
You could... Try to read with OleDb and if you have 12 columns, use the COM object to insert a non-affecting value into a few rows of the first column. The non-affecting value could eb removed when you go to read the SS back in. Kinda hacky, but that way you dont have to read all with COM obj.
StingyJack
A: 

You might try using Excel and COM. That way, you'll be getting your info straight form the horse's mouth, as it were.

From D. Anand over on the MSDN forums:

Create a reference in your project to Excel Objects Library. The excel object library can be added in the COM tab of adding reference dialog.

Here's some info on the Excel object model in C# http://msdn.microsoft.com/en-us/library/aa168292(office.11).aspx

The horses mouth takes a while to chew, so this wont be so great for large (> 1000 row) files.
StingyJack
Also the horses mouth doesnt run well in a server environment if this is a server environmenmt
JoshBerke
Yeah, I'd like to avoid going the COM or Interop route because of the speed issue; but that may be what we have to do. Any other ideas before I go that way?
Austin
Ah, good points all. My solution is a poor one.
Not a poor one. It may be the only way to handle this.
StingyJack
+2  A: 

SpreadsheetGear for .NET gives you an API for working with xls and xlsx workbooks from .NET. It is easier to use and faster than OleDB or the Excel COM object model. You can see the live samples or try it for yourself with the free trial.

Disclaimer: I own SpreadsheetGear LLC

EDIT:

StingyJack commented "Faster than OleDb? Better back that claim up".

This is a reasonable request. I see claims all the time which I know for a fact to be false, so I cannot blame anyone for being skeptical.

Below is the code to create a 50,000 row by 10 column workbook with SpreadsheetGear, save it to disk, and then sum the numbers using OleDb and SpreadsheetGear. SpreadsheetGear reads the 500K cells in 0.31 seconds compared to 0.63 seconds with OleDB - just over twice as fast. SpreadsheetGear actually creates and reads the workbook in less time than it takes to read the workbook with OleDB.

The code is below. You can try it yourself with the SpreadsheetGear free trial.

using System;
using System.Data; 
using System.Data.OleDb; 
using SpreadsheetGear;
using SpreadsheetGear.Advanced.Cells;
using System.Diagnostics;

namespace SpreadsheetGearAndOleDBBenchmark
{
    class Program
    {
        static void Main(string[] args)
        {
            // Warm up (get the code JITed).
            BM(10, 10);

            // Do it for real.
            BM(50000, 10);
        }

        static void BM(int rows, int cols)
        {
            // Compare the performance of OleDB to SpreadsheetGear for reading
            // workbooks. We sum numbers just to have something to do.
            //
            // Run on Windows Vista 32 bit, Visual Studio 2008, Release Build,
            // Run Without Debugger:
            //  Create time: 0.25 seconds
            //  OleDb Time: 0.63 seconds
            //  SpreadsheetGear Time: 0.31 seconds
            //
            // SpreadsheetGear is more than twice as fast at reading. Furthermore,
            // SpreadsheetGear can create the file and read it faster than OleDB
            // can just read it.
            string filename = @"C:\tmp\SpreadsheetGearOleDbBenchmark.xls";
            Console.WriteLine("\nCreating {0} rows x {1} columns", rows, cols);
            Stopwatch timer = Stopwatch.StartNew();
            double createSum = CreateWorkbook(filename, rows, cols);
            double createTime = timer.Elapsed.TotalSeconds;
            Console.WriteLine("Create sum of {0} took {1} seconds.", createSum, createTime);
            timer = Stopwatch.StartNew();
            double oleDbSum = ReadWithOleDB(filename);
            double oleDbTime = timer.Elapsed.TotalSeconds;
            Console.WriteLine("OleDb sum of {0} took {1} seconds.", oleDbSum, oleDbTime);
            timer = Stopwatch.StartNew();
            double spreadsheetGearSum = ReadWithSpreadsheetGear(filename);
            double spreadsheetGearTime = timer.Elapsed.TotalSeconds;
            Console.WriteLine("SpreadsheetGear sum of {0} took {1} seconds.", spreadsheetGearSum, spreadsheetGearTime);
        }

        static double CreateWorkbook(string filename, int rows, int cols)
        {
            IWorkbook workbook = Factory.GetWorkbook();
            IWorksheet worksheet = workbook.Worksheets[0];
            IValues values = (IValues)worksheet;
            double sum = 0.0;
            Random rand = new Random();
            // Put labels in the first row.
            foreach (IRange cell in worksheet.Cells[0, 0, 0, cols - 1])
                cell.Value = "Cell-" + cell.Address;
            // Using IRange and foreach be less code, 
            // but we'll do it the fast way.
            for (int row = 1; row <= rows; row++)
            {
                for (int col = 0; col < cols; col++)
                {
                    double number = rand.NextDouble();
                    sum += number;
                    values.SetNumber(row, col, number);
                }
            }
            workbook.SaveAs(filename, FileFormat.Excel8);
            return sum;
        }

        static double ReadWithSpreadsheetGear(string filename)
        {
            IWorkbook workbook = Factory.GetWorkbook(filename);
            IWorksheet worksheet = workbook.Worksheets[0];
            IValues values = (IValues)worksheet;
            IRange usedRahge = worksheet.UsedRange;
            int rowCount = usedRahge.RowCount;
            int colCount = usedRahge.ColumnCount;
            double sum = 0.0;
            // We could use foreach (IRange cell in usedRange) for cleaner 
            // code, but this is faster.
            for (int row = 1; row <= rowCount; row++)
            {
                for (int col = 0; col < colCount; col++)
                {
                    IValue value = values[row, col];
                    if (value != null && value.Type == SpreadsheetGear.Advanced.Cells.ValueType.Number)
                        sum += value.Number;
                }
            }
            return sum;
        }

        static double ReadWithOleDB(string filename)
        {
            String connectionString =  
                "Provider=Microsoft.Jet.OLEDB.4.0;" + 
                "Data Source=" + filename + ";" + 
                "Extended Properties=Excel 8.0;"; 
            OleDbConnection connection = new OleDbConnection(connectionString); 
            connection.Open(); 
            OleDbCommand selectCommand =new OleDbCommand("SELECT * FROM [Sheet1$]", connection); 
            OleDbDataAdapter dataAdapter = new OleDbDataAdapter(); 
            dataAdapter.SelectCommand = selectCommand; 
            DataSet dataSet = new DataSet(); 
            dataAdapter.Fill(dataSet); 
            connection.Close(); 
            double sum = 0.0;
            // We'll make some assumptions for brevity of the code.
            DataTable dataTable = dataSet.Tables[0];
            int cols = dataTable.Columns.Count;
            foreach (DataRow row in dataTable.Rows)
            {
                for (int i = 0; i < cols; i++)
                {
                    object val = row[i];
                    if (val is double)
                        sum += (double)val;
                }
            }
            return sum;
        }
    }
}
Joe Erickson
Faster than OleDb? Better back that claim up.
StingyJack
StingyJack: I don't blame you for being skeptical. I've edited my response with code which demonstrates that SpreadsheetGear is indeed faster than OleDb.
Joe Erickson
My numbers were a little different, but reasonably close. You really should post that on your site. When shopping around for an xls component, this would be really info to see.
StingyJack
We have resisted the urge to put this sort of benchmark on our site because the benchmark which matters is how fast it works for real applications, so we have quotes from real named customers on the right hand side of this page: http://www.spreadsheetgear.com/products/spreadsheetgear.net.aspx
Joe Erickson
A: 

I recommend you to try Visual Studio Tools for Office and Excel Interop! It's using is very easy.

abatishchev
+1  A: 

You can probably look at ExcelMapper. It is a tool to read excel files as strongly typed objects. It hides all the details of reading an excel from your code. It would take care if your excel is missing a column or data is missing from a column. You read data that you are interested in. You can get the code/executable for ExcelMapper from http://code.google.com/p/excelmapper/.