views:

4583

answers:

5

What is the most efficient way to enumerate every cell in every sheet in a workbook?

The method below seems to work reasonably for a workbook with ~130,000 cells. On my machine it took ~26 seconds to open the file and ~5 seconds to enumerate the cells . However I'm no Excel expert and wanted to validate this code snippet with the wider community.

DateTime timer = DateTime.Now;
Microsoft.Office.Interop.Excel.Application excelApplication = new Microsoft.Office.Interop.Excel.Application();
try
{
    exampleFile = new FileInfo(Path.Combine(System.Environment.CurrentDirectory, "Large.xlsx"));
    excelApplication.Workbooks.Open(exampleFile.FullName, false, false, missing, missing, missing, true, missing, missing, true, missing, missing, missing, missing, missing);
    Console.WriteLine(string.Format("Took {0} seconds to open file", (DateTime.Now - timer).Seconds.ToString()));

    timer = DateTime.Now;
    foreach(Workbook workbook in excelApplication.Workbooks)
    {
         foreach(Worksheet sheet in workbook.Sheets)
         {
      int i = 0, iRowMax, iColMax;
      string data = String.Empty;

      Object[,] rangeData = (System.Object[,]) sheet.UsedRange.Cells.get_Value(missing);

      if (rangeData != null)
      {
       iRowMax = rangeData.GetUpperBound(0);      
       iColMax = rangeData.GetUpperBound(1);              

       for (int iRow = 1; iRow < iRowMax; iRow++)
       {
            for(int iCol = 1; iCol < iColMax; iCol++)
        {
         data = rangeData[iRow, iCol] != null ? rangeData[iRow, iCol].ToString() : string.Empty;
         if (i % 100 == 0)
         {
          Console.WriteLine(String.Format("Processed {0} cells.", i));
         }

         i++;
        }                         
       } 
      }
     }

     workbook.Close(false, missing, missing);
    }

    Console.WriteLine(string.Format("Took {0} seconds to parse file", (DateTime.Now - timer).Seconds.ToString()));    
    }
    finally
    {
        excelApplication.Workbooks.Close();    
        excelApplication.Quit();         
    }

Edit:

Worth stating that I want to use PIA and interop in order to access properties of excel workbooks that are not exposed by API's that work directly with the Excel file.

+2  A: 

Excel PIA Interop is really slow when you are doing things cell by cell.

You should select the range you want to extract, like you did with the Worksheet.UsedRange property and then read the value of the whole range in one step, by invoking get_Value() (or just simply by reading the Value or Value2 property, I can't remember which one) on it.

This will yield an object[,], that is, a two dimensional array, which can be easily enumerated and is quick to be read.

EDIT: I just read your actual code and realized that it actually does what I proposed. Shame on me for not reading the question properly before answering. In that case, you cannot make it much faster. Excel PIA Interop is slow. If you need a quicker solution you will have to either migrate jExcelApi from Java to C# (not a terribly hard thing to do) or use some commercial component. I suggest to avoid the OLEDB interface at all costs, in order to keep your sanity.

Unrelated, but helpful tip: You should use the ?? operator. It is really handy. Instead of

data = rangeData[iRow, iCol] != null ? rangeData[iRow, iCol].ToString() : string.Empty;

you could just write

data = Convert.ToString(rangeData[iRow, iCol]) ?? string.Empty;

In that case, even String.Empty is not necessary since Convert.ToString(object) converts null to an empty string anyway.

DrJokepu
A: 

I think, this is the most efficient way, how do it with PIA. Maybe will littlebit faster using "foreach" insted of "for", but it will not dramatic change.

If is efficiency your primary goal, you should work with excel files directly - without excel application.

TcKs
+1  A: 

There is an open source implementation of an Excel reader and writer called Koogra. It allows you to read in the excel file and modify it using pure managed code. This would probably be much faster than the code you are using now.

Rune Grimstad
+1 Thanks for the pointer to Koogra. That's going to come in handy in my project.
Marve
A: 

For more information on For each loop enumerators (Foreach ADO.NET Schema Rowset Enumerator using Excel Sheets) and programming integration service (SSIS) using C# .Net and VB .Net, please visit the below link: http://www.sqllion.com/2009/06/programming-foreach-loop-container-%e2%80%93-enumerating-excel-sheets/

A: 

Blockquote

How to Check if a particluar cell in that range is locked,

I am posting the small snippet of my code, Please suggestme , the better way of writing to cell if the cell is not locked.

if (reader.HasRows) { minRow = 0; minCol = 0;

                                    // Process each result in the result set
                                    while (reader.Read())
                                    {
                                        // Create an array big enough to hold the column values
                                        object[] values = new object[reader.FieldCount];

                                        // Add the array to the ArrayList
                                        rowList.Add(values);

                                        // Get the column values into the array
                                        reader.GetValues(values);

                                        int iValueIndex = 0;
                                        int jValueIndex = 1;

                                        // If the Reading Format is by ColumnByColumn 
                                        if (CurTaskNode.ReadFormat == "ColumnbyColumn")
                                        {
                                            minCol = 0;
                                            int lengthHeader = 0;
                                            if (CurTaskNode.ReadHeader == true)
                                            {
                                                lengthHeader = CurTaskNode.HEADER_MAX_ROW - CurTaskNode.HEADER_MIN_ROW;
                                            }
                                            else
                                            {
                                                lengthHeader = CurTaskNode.HeaderData.Length;

                                            }
                                            for (int iCol = 0; iCol < lengthHeader; iCol++)
                                            {

                                                // Checking whether the Header data exists or not
                                                if (CurTaskNode.HeaderData[minCol] != "")
                                                {
                                                    //if (!(excelworksheet.Cells[CurTaskNode.DATA_MIN_ROW + minRow, CurTaskNode.DATA_MIN_COL + minCol]).Locked)
                                                    //{

                                                    // Assigning the Value from reader to the particular cell in excel sheet

                                                            excelworksheet.Cells[CurTaskNode.DATA_MIN_ROW + minRow, CurTaskNode.DATA_MIN_COL + minCol] = values[iValueIndex];
                                                            iValueIndex++;

                                                    //}

                                                }
                                                minCol++;
                                            }
                                            minRow++;
                                        }

In the code, writing to excel cell is

excelworksheet.Cells[CurTaskNode.DATA_MIN_ROW + minRow, CurTaskNode.DATA_MIN_COL + minCol] = values[iValueIndex];

here I have to check the condition whether the cell is locked or not, I tried it, but this is not correct.

CurTaskNode.DATA_MIN_ROW is the minrow value for that data to write in excel sheet, this value comes from XML file and if I have to write from (10, 2) to (20, 10) , cells, in which if the first cell is say locked, I cant write it to that cell and I have to proceed writing with other cells.

Please help me in doing this. Thanks, Ramm

Aditya
This would be better asked as a separate question, it's not really relevant here.
Edward Wilde