ansaurus

Question

Answer 1

A:

I think you can't do any more optimisation to your dot_product function. You have to run down one dictionary and check if the second one contains any of these IDs. Maybe you could implement some check which dictionary is smaller in size and perform the for each on this one. This could give you some additional performance if the size of both can vary in big numbers (e.g. arr1 = 500,000 and arr2 = 1,000).

But if you think this is still too slow, then the performance impact doesn't maybe come from this function. Maybe the bigger problem is the creation and filling of the dictionaries. So maybe you can go better by using your simple array methods. But this depends on how often you have to create the necessary structures for your function. Do you have to create these dictionaries from scratch each time you need them or will they created and filled at startup and any changes afterwards will be reflected directly into these structures?

To get a good answer to your question (from yourself) you should not only check your algorithm (which seems quite fast to me), but also how much time is needed to create and maintain the necessary infrastructure for this function and how high are the costs for these?

Update

After reading your comment, i can't really see why this method is so slow (without using a profiler ;-)). NOrmally a TryGetValue should perform somewhere around O(1), the calculation itself isn't that hard either. So the only thing would be to optimise the foreach run. But due to the fact, that someone has to iterate over all items, you can only make this only a little shorter be selecting the shortest of the two for this step (like already mentioned).

Expect from this i can't see anything more you can do.

Oliver 2010-05-10 13:03:21

Thanks.I did some profiling, and this method (alongside another very similar one) is the culprit of the low performance of my program.The dictionaries are populated in advance, and even in my above mentioned experiments I've measured only the execution time of the multiplication, without the data population time.The multiplied arrays are roughly the same size, but I tried your suggestion anyway - there's no change in execution time.

Haggai 2010-05-10 13:10:53

Answer 2

+2 A:

I recommend using SortedList<int, double> instead of the Dictionary. Instead of running TryGetValue repeatedly, you can now create two separate Enumerators and walk each list in parallel. Always move forward with whichever list is 'behind' in enumeration and any time you see two enumerated elements equal, you've found a match. Don't have my IDE handy at the moment, but pseudo-code is like this:

Get enumerator for vector A
Get enumerator for vector B
while neither enumerator is at the end
   if index(A) == index(B) then
     this element is included in dot product
     move forward in A and B
     continue next loop iteration

   if index(A) < index(B)
     move forward in A
   else
     move forward in B
continue while loop

Dan Bryant 2010-05-10 14:29:33

Another possibility, if the sparse arrays are clustered (i.e. the index values tend to be close together, but may start at any arbitrary location) is to store a 'flat' array and starting index. You can then dot the arrays over the intersection of the two index sets. Depending on your problem domain, there may be index transformations that yield this sort of clustering (for example, a matrix with entries clustered near the diagonal can be re-indexed to place diagonal and near-diagonal elements close to each other.)

Dan Bryant 2010-05-10 14:51:59

Thanks. The SortedList is actually an implementation of my "double" method, and I tested it now and it didn't improve running times considerably.

Haggai 2010-05-11 06:51:31

Another possibility, if your target machine has multiple cores (most have at least hyperthreading these days) is to compute the dot product in parallel. If you can use .NET 4, there are extensions that make this much easier. There is overhead associated with this, but it might still be faster for your reasonably large sets. I suspect you are being limited by memory cache misses rather than CPU cycles, but it may be worth trying.

Dan Bryant 2010-05-11 13:49:09

I'll certainly look into it in the future, thanks.

Haggai 2010-05-12 13:39:34

Answer 3

A:

Thanks all. I've decided on converting the code to using parallel walk on sorted arrays (the "double" method), which with the correct wrapper didn't take as much time to convert as I feared it would. Apparently the JIT/compiler optimizations don't work so well with generics as they do with arrays.

Haggai 2010-05-12 13:44:45

I proposed a solution as well, just had to sleep on it first ;)

Mikael Svenson 2010-05-12 16:11:17

Answer 4

A:

You could try this which is pretty fast. Define a struct like:

public struct MyDoubles
{
    public Double Val1 { get; set; }
    public Double Val2 { get; set; }
    public Double Product()
    {
        return Val1 * Val2;
    }
}

And define an array as long as the biggest id.

MyDoubles[] values = new MyDoubles[1000000];

Then populate Val1 with values from array1 and Val2 with values from array2 using the id as the index position.

Then loop over and calculate:

public double DotProduct2(MyDoubles[] values)
{
    double res = 0;
    for (int i = 0; i < values.Length; i++)
    {
        res += values[i].Product();
    }
    return res;
}

Depending on your largest id, you might have a memory issue, and there's the matter of setting up the data structure as well.

My timings for calculations with the dictionary version vs. my proposed array/struct version yields these numbers:

Dict: 5.38s
Array: 1.87s

[Update with release build]

Dict: 4.70s
Array: 0.38s

Mikael Svenson 2010-05-12 16:10:13

ansaurus

tags:

views:

answers:

Optimizing sparse dot-product in C#

Update

related questions