tags:

views:

294

answers:

2

Hi All,

I am programatically opening excel workbooks under a folder tree to check for some project references using the following code -

workbook = app.Workbooks.Open(fileName,false,true,Missing,Missing,.....);
foreach(Reference r in workbook.VBProject.References)
{
      //check for a specific reference here
}

This works fine but my folder structure is very deep and I have over 20,000 spreadsheets stored in them. Sometimes depending on the size of the excel file, the call to Workbooks.Open() takes a long time (over 5 minutes per call on some files). Is there a faster more efficient way to do this?

Thanks for the help

A: 

I don't think you can increase the Workbooks.Open performance. However if your main intention is just to check if a particular reference is used by the Spreadsheet or not, then consider opening the Excel file in binary mode and searching for the dll string.(The path of the dll providing the functionality, which can be seen in the Location part of the References Window).

This would be very crude way, but if the Workbook.Open performance is really a bottleneck then you can definitely give it a try.

Adarsha
Thanks. I didn't try the binary option yet as it might not be very straightforward and managed it with the WorkBooks.Open for now.
+1  A: 

It seems like whenever you have to hit the Excel object model, you're going to take a performance hit. I agree with the previous poster, that if you want to speed up performance, you'll need to read the Excel files directly.

As a side note, since Excel 2007 files (*.xlsm, *.xltm) are essentially *.zip files, you would need to find and access the vbaProject.bin file directly. A quick look points to the path as (I changed the extension so I could browse the file):

..\Book1.zip\xl\vbaProject.bin

Obviously you could dig through that bin file manually and find particular references (as suggested by the previous poster), but if you're looking to loop through all of the references in a project, you'll need to use the API calls IStream/IStorage. There's a great article about reverse engineering the Office BIN files here: http://www.codeproject.com/KB/cs/office2007bin.aspx. To access references in vbaProject.bin, look for the section titled "Reading or updating vbaProject.bin parts". There is also a sample C# code project that demonstrates how to read an OLE container. I just took a peak at the code sample, so I can't attest to it's effectiveness, but it certainly seems in order.

Hope that helps!

TimS
True, we are not rolling for Office 2007 until later in the year so I am sticking to the existing code for now but Office 2007 would make it better to do this operation. Thanks for the tip.