views:

632

answers:

7

Background

I have a client who needs Excel VBA code that produces formula values moved to VB.NET. He is in the business of providing financial analytics, in this case delivered as an Excel add-in. I have translated the VBA into VB.NET code that runs in a separate DLL. The DLL is compiled as a COM Server because, well, Excel-callable .NET UDFs have to be. So far, so good: Excel cells have "=foo(Range1, Range2, ...)", the VB.NET Com Server's UDF is called, and the cell obtains a value that matches the VBA code's value.

The problem

The VB.NET code is way slower. I can stretch a range of VBA-based formulas and get instantaneous calculation. I can stretch a comparable range of VB.NET-based formulas and the calculation takes 5-10 seconds. It is visibly slower and unacceptable to the client.

There are a few possibilities that occur to me:

  1. native compilation of VBA is faster because of the absence of a switch
  2. the DLL may be loaded and unloaded for each UDF call
  3. the DLL calls Excel WorksheetFunction methods and requires an Application object, and creating the Application object is expensive
  4. calling an Excel WorksheetFunction method from the DLL is expensive

I don't think that (2) is true because I put calls to append to a file in the Shared New, the Public New, and Finalize functions, and all I get are:

Shared Sub New
Public Sub New
Finalize

when I open the spreadsheet, repeatedly stretch a formula range, and close the spreadsheet.

I don't think (3) is true because the file writing shows that the Application object is created only once.

The question

How do I figure out what is taking the time? How to profile in this environment? Are there obvious enhancements?

In the last category, I have tried to reduce the number of creations of an Application object (used for WorkSheetFunction calls) by making it Shared:

<Guid("1ECB17BB-444F-4a26-BC3B-B1D6F07D670E")> _
<ClassInterface(ClassInterfaceType.AutoDual)> _
<ComVisible(True)> _
<ProgId("Library.Class")> _
Public Class MyClass
    Private Shared Appp As Application ' Very annoying

Approaches taken

I've tried to reduce the dependence on Excel mathematical functions by rewriting my own. I've replaced Min, Max, Average, Stdev, Small, Percentile, Skew, Kurtosis, and a few more. My UDF code calls out to Excel much less. The unavoidable call seems to be taking a Range as an argument and converting that to a .NET Array for internal use.

+1  A: 

My guess based on a lot of experience using Excel via COM Interop is that it is the context switch and / or the mapping of data from Excel's internal data structures to .NET objects.

SpreadsheetGear for .NET might be an option for you. It is much faster than Excel via COM Interop (see what some customers say here) and it does support Excel compatible calculations and user defined functions (see the Custom Functions sample on this page).

You can download a free trial here if you want to try it out.

Disclaimer: I own SpreadsheetGear LLC

Joe Erickson
SpreadsheetGear for .NET replaces which part of the system: (1) Excel (where the cells and UDF calls live) or (2) the calls back to Excel from the VB.NET UDF or (3) something else?
hughdbrown
I don't know enough about your app to answer your question. SpreadsheetGear for .NET is an Excel compatible spreadsheet component which can be used as a library (no UI) or you can use our WinForms control. Many of our customers move applications from Excel / VBA / COM Interop / etc... to SpreadsheetGear - some for performance, some for protection of IP, some to have a 100% .NET solution, etc...
Joe Erickson
+1  A: 

I have the same experience as Joe. It is mostly the interop that is slow.

In most cases this can be solved by working with entire ranges instead if individual cells. You do this by using .Net arrays and the pass them to/from excel in one call.

e.g.

Dim values(10,10) As object

Dim r As Excel.Range = Me.Range("A1")
r = r.Resize(UBound(values, 1), UBound(values,2))
values = r.Value

For ii = 0 To UBound(values,1)
    For jj = 0 To UBound(values,2)
        values(ii,jj) = CType(values(ii,jj), Double)*2
    Next
Next

r.Value = values

This has solved all performance problems I have seen

adrianm
How do you get an Excel cell formula to pass an array rather than a Range? I have: "=foo($A$1:$A$125)". Do I just change the signature of the VB.NET UDF to accept an array rather than a Range? This gives me #VALUE errors.
hughdbrown
I didn't mean that you should change the arguments. Just don't access the individual cells in the range, i.e.range.Cells(ii,jj). Convert the range to a .Net array first.
adrianm
+2  A: 

The DLL is compiled as a COM Server because, well, Excel-callable .NET UDFs have to be

A bit of a show-stopper if true, I agree. But of course, it isn't true at all, why else would I have started that way...

You can write your UDFs in C++ against the Excel SDK and deliver them as an XLL, for one thing. It's a common practice among quantitative analysts in banks; in fact they seem to enjoy it, which says a lot about them as a group.

Another, less painful option, that I've only recently come across, is ExcelDNA, which, AFAICT, provides the nasty SDK/XLL bit with a way to hook up your .NET DLLs. It's sufficiently cool that it even lets you load source code, rather than building a separate DLL, which is great for prototyping (it makes use of the fact that the CLR actually contains the compiler). I don't know about performance: I haven't attempted to benchmark it, but it does seem to get around the COM Interop issue, which is well-known to be ghastly.

Beyond that, I can only endorse other recommendations: reference your workbook, its content and the Excel application as little as possible. Every call costs.

Mike Woodhouse
C++ is a non-starter for this client. I'll check out ExcelDNA.
hughdbrown
+2  A: 

I seriously suppose that interop from VB.NET to the COM server is done via marshalling. In VBA the methods were called directly - the control was passed into them at a cost of couple of processor instructions and that looked really fast. Now with marshalling a whole set of extra work is done and each call encounters a serious overhead. You need to either seriously reduce the number of calls (make each call do more work) or disable marshalling and work as if was with VBA. See this question for details on how to possibly accomplish the latter.

sharptooth
@sharptooth: I looked at the question you cited. Your advice seems to be, "Change the interface to make it support default marshalling, and that requires that the interface be automation-compatible." Assuming that an Excel Range is not automation-compatible, I can't get this advice to work in either direction: I can't avoid sending Ranges from Excel to my UDFs and I can't avoid passing Ranges to Excel when I invoke library routines in Excel. Do you have a more concrete idea of how the interfaces I am using might change to implement your advice?
hughdbrown
I guess you just can't change the interfaces. Instead you could try to disable marshalling. I didn't quite get what the OP of that question did that helped him do it, but it seems that he used the [STAThread] attribute on the calling thread and that induced calling CoInitializeEx() with COINIT_APARTMENTTHREADED flag. If that works for you it should solve your problem.
sharptooth
+2  A: 

I recently benchmarked moving data from Excel to .NET using various products/methods. All the .NET methods I tried were slower than VBA and VB6 but the best ones were able to use the XLL interface which gave better results than the Automation interface. the benchmark was reasonably optimised (transferring ranges to arrays etc) results were (millisecs for my benchmark)

  • VB6 COM addin 63

    C XLL 37

    Addin Express Automation VB.net 170

    Addin Express XLL VB.net 100

    ExcelDNA XLL CVB.Net 81

Managed XLL gave comparable times but also enables cusom marshallers which can be fast.

Charles Williams
+2  A: 

There is some more performance stuff for ExcelDna on CodePlex: http://exceldna.codeplex.com/Wiki/View.aspx?title=ExcelDna%20Performance.

For really simple functions, the overhead of calling a managed function through ExcelDna is very small, allowing you to make several hundred thousand UDF calls per second.

Govert
Very cool. As I read it, the actual source is revealed in the .DNA file. I am pretty sure that would not fly with my client. Still, I wish I'd looked into this route earlier.
hughdbrown
Hi - Your UDF source code can be in the .dna text file, or in an external compiled .NET .dll. The Getting Started tutorial shows how to reference an external library. If you want to secure your code, you have the usual .NET obfuscation issues, and I would recommend at least something like DotFuscator. But this is not different between ExcelDna and other .Net solutions.The .dna text files make .Net-based Excel add-ins possible without using any development environment - you just need the .Net 2.0 runtime installed and you can make an add-in using VB or C# code.
Govert
A: 

One thought. Instead of passing the Range object (it could be that every call onto the Ranbe object could be marshalled from .Net to Excel), collate all your parameters into basic types, doubles, strings, typed arrays and if necessary un-typed variant arrays, and pass them into the .Net DLL. That way you only have to marshall a variant.

-- DM

DangerMouse
@DangerMouse: Think about what the Excel UDF call looks like. The user wants to be able to select a few ranges in a worksheet. That's going to be marshaled to my code as an Excel Range. Realize that I want the user to pass multiple Ranges. How do you figure I am going to get variants marshaled to me instead? And how am I going to deal with with mass of variants and make sense of them?
hughdbrown