views:

345

answers:

4

I'm trying to write an import function for getting data out of an excel file. How I currently do it is as follows:

Private Sub ReadExcel(ByVal childform As PhoneDiag.frmData, ByVal FileName As String)

        Dim xlApp As Excel.Application
        Dim xlWorkBook As Excel.Workbook
        Dim xlWorkSheet As Excel.Worksheet

        xlApp = New Excel.ApplicationClass
        xlWorkBook = xlApp.Workbooks.Open(FileName)
        xlWorkSheet = xlWorkBook.Worksheets(1)
        Dim columnrange = xlWorkSheet.Columns
        Dim therange = xlWorkSheet.UsedRange

        ''Add rows by column
        For rCnt = 2 To therange.Rows.Count

            Dim rowArray(therange.Columns.Count) As String

            For cCnt = 1 To therange.Columns.Count

                Dim Obj = CType(therange.Cells(rCnt, cCnt), Excel.Range)
                Dim celltext As String
                celltext = Obj.Value.ToString
                rowArray((cCnt - 1)) = celltext

            Next

            childform.datagridSample.Rows.Add(rowArray)

        Next

        '' make sure we close the excel.exe service after use
        xlWorkBook.Close()
        xlApp.Quit()
        System.Runtime.InteropServices.Marshal.ReleaseComObject(xlWorkBook)
        System.Runtime.InteropServices.Marshal.ReleaseComObject(xlWorkSheet)
        System.Runtime.InteropServices.Marshal.ReleaseComObject(xlApp)

    End Sub

The problem is, of course, is that it runs horribly. From what I can gather, it more than likely comes down to this line:

Dim Obj = CType(therange.Cells(rCnt, cCnt), Excel.Range)

All I need is the text from the cells, not to create an object for each cell (and then not send them to garbage collection). Is there an easier way of just getting the text?

Ideally, if I can get a method for getting the text values of the cell, I'd like to get the multiple rowArray()'s added to a master array and update the program's values later.

If you see any other performance tips, let me know. It would be much appreciated. =b

EDIT: I also realize that I have two options if I were to create a master array, say mArr, to hold all the data. Would it be better performance-wise to have mArr to be large and the sub arrays small, or mArr to be small and the sub arrays to hold more of the information?

I ask because the files that will be imported will have more rows than columns, so I was wondering if there was any "set" way of doing it.

+1  A: 

here's a C# version (but you get the gist) that will get data into a datatable...

con = new OleDbConnection(@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source="+SpreadsheetLocation+";Extended Properties=\"Excel 8.0;IMEX=1;HDR=NO\"");
OleDbDataAdapter da = new OleDbDataAdapter("SELECT * FROM ["+Worksheet+"$]", con);
DataTable dt = new DataTable();
da.Fill(dt);

where "SpreadSheetLocation" and "Worksheet" are a file path and work sheet names respectively. You can then convert the datatable rows to arrays if you like.

Update : you also don't need Excel installed in the machine for this solution...

Handleman
+1  A: 

For simple reading, I'd use the Excel Data Reader available on CodePlex.

Machines using the component don't need Excel installed and it's pretty easy to use. You can read a Worksheet into a DataSet.

Jay Riggs
+1  A: 

Reading cell content individually is a huge performance killer. My advice is to read first the entire range into an array of objects, and then retrieve the data from that array. I don't write in VB.NET, but in C# the code looks something like this:

Excel.Range firstCell = excelWorksheet.get_Range("A1", Type.Missing);
Excel.Range lastCell = excelWorksheet.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell, Type.Missing);
object[,] cellValues;
Excel.Range worksheetCells = excelWorksheet.get_Range(firstCell, lastCell);
cellValues = worksheetCells.Value2 as object[,];

This example reads the entire contents of a sheet into cellValues (careful about nulls). The same advice applies to writing to the sheet - do it all at once, with one array.
If you are interested, I have a longer post on this there
Oh and BTW, replace

xlApp = New Excel.ApplicationClass

by

xlApp = New Excel.Application
Mathias
Just out of curiosity, why replace ApplicationClass with Application?
Riddari
From MSDN: "This class supports the .NET Framework infrastructure and is not intended to be used directly from your code."http://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel.applicationclass(office.11).aspxI don't know if anything could go wrong using it, but the Application interface is meant exactly for that. It's funny because the ApplicationClass is shown in lots of places on the internet, I suppose there must be one code snippet that has been copied over and over again...
Mathias
A: 

SpreadshsetGear for .NET will allow you to open a workbook and access the raw values of cells (numbers, text, logical or error) or get the formatted text of the cells. Because SpreadsheetGear runs as part of your application rather than COM Interop as with Excel, it will run much faster (see the comments on this page to see what some of our customers have said about performance).

You can see live samples here and download the free trial here.

Disclaimer: I own SpreadsheetGear LLC

Joe Erickson