views:

729

answers:

3

I'm working on a fairly large project for a trading company in Philadelphia. The company utilizes automated trading algorithms which process streaming quotes and send out quotes for hundreds of products dozens of times per second. Obviously, performance is a significant concern. (Which makes me question why we're using VB.NET, but that's another topic entirely.)

I'm relatively new to the company and am working with another guy on some code that's been around for a while. This code utilizes a Microsoft.VisualBasic.Collection object to store all of the products (objects representing pairs of ETFs or stocks and a large amount of data about each) and does a LOT of searching/retrieving from that Collection.

As I understand it, the Collection class is deprecated and pretty much no one uses it anymore. In our more recent code we've been using .NET collections such as List(Of T) and Dictionary(Of TKey, TValue) and from what I understand it might make sense to replace the old Collection with a Dictionary. However, as the source code is quite substantial, going ahead with this replacement would be a significant undertaking; and so my question is just this:

Has anyone actually measured the performance difference between the old Collection and a .NET Dictionary? Is such a comparison, for whatever reason, inappropriate? It certainly seems that everything we are currently doing with the Collection we could do with a Dictionary; basically I just want to know if it makes sense for us to go through the code and make this transition, or if doing so would essentially be a waste.

EDIT: Originally in the question I referred to the current Collection we are using as a VB6 Collection. After reading the first two answers I realize it is more accurately a Microsoft.VisualBasic.Collection, which appears to be a class introduced for compatibility between VB6 and VB.NET. I think the question still holds.

Based on the first link provided in Kenneth Cochran's answer, I am led to believe that a Dictionary would indeed be better suited to our purposes than a Collection as it performs better at retrieving items by key and running "For Each" loops by several milliseconds for 10,000 runs. At our company, this is a realistic scenario; there are lots of places in the code with statements like the following:

Dim ETF as ETFdetails = ETFcoll(sym)

And as I said, these lines execute on hundreds of products, many times per second. With this in mind I am inclined to think we really should go ahead and make the change, then measure any performance difference. I expect that we will see at least a mild but noticeable improvement.

Is there anything obviously wrong with what I've just said? If so, point it out!

+1  A: 

There's nothing wrong with VB.Net performance. It compiles to the same IL that C# does, which is in turn JIT-compiled to machine language. That's why it's called the .Net Framework rather than the .Net VM.

While I haven't seen a head-to-head comparison of the VB6 Collection vs VB.Net Dictionary, I would expect them to be similar since the underlying algorithm is essentially a hashtable either way. That said, if there is going to be a small difference I'm inclined to give the advantage to the Dictionary, because there's no casting/late binding involved. The system will spend less time worrying about checking or translating types.

Of course, this assumes you're using VB.Net in a strongly-typed way, with Option Strict and Option Explcit turned on.

Joel Coehoorn
I understand that .NET languages compile to the same IL; I wasn't questioning the company's use of VB.NET versus, say, C#. Rather, given the boss's extreme--I'd call it borderline obsessive--emphasis on performance, I am just personally unsure why we are not coding in a language like C++. Then again, I am relatively new to .NET; maybe my notion that C++ would be faster is misguided.It is the late binding that I was mainly concerned about. With the Collection we're using, the keys and values can be any type of object. I was under the impression this would affect performance, if only slightly.
Dan Tao
+2  A: 

If you are using VB.NET you are not using VB6 collections. The VB.NET collection is functionally equivalent to the VB6 collection but they are not the same. http://www.vbmigration.com/Blog/post/2008/11/Speed-up-your-VBNET-collections.aspx has a comparison of the various .NET collection types with the VB6 collection, including the VB.NET collection. Each collection type has its strong points and weak points (why else would we have so many collection types to choose from). Some are faster and insertion at the expense of searching and vice versa. Some are faster with small collections while other are faster with large collections. Your choice should depend on which performance attribute is most important to you.

Here is a table I stumbled across that gives relative performance of the standard .NET collection types. Notice VB.NET collection is not included: http://www.artima.com/forums/flat.jsp?forum=152&thread=179998

codeelegance
+1. I would emphasise that Dan needs to decide based on which performance attribute is important to him. That probably means making his own measurements. It also means setting explicit goals about how fast the system has to respond, and measuring the different components to see whether they match up. Even if Dictionary was faster, it would be unnecessary to change if Collection was fast enough for the job at hand.
MarkJ
From the link Kenneth provided I think it would make sense for us to convert the Collection to a Dictionary (see my edit). Does this seem reasonable to you?
Dan Tao
Yes it does, although I would be inclined to test the potential speed increase on your production machine before you put it in the production code. I don't know how feasible that is for you.
MarkJ
+1  A: 

Apart from performance, I would recommend using the dictionary anyway. It has got generic type parameters so you can directly specify the datatypes to be held by the dictionary. This prevents many errors (which can be detected at compile-time) and should increase the performance because many casts and runtime-typechecks are unnecessary.

The algorithmic complexity of both datastructures is O(1).

Dario