views:

181

answers:

4

Hello.

I'm trying to calculate the dot-product of two very sparse associative arrays. The arrays contain an ID and a value, so the calculation should be done only on those IDs that are common to both arrays, e.g. <(1, 0.5), (3, 0.7), (12, 1.3)> * <(2, 0.4), (3, 2.3), (12, 4.7)> = 0.7*2.3 + 1.3*4.7 .

My implementation (call it dict) currently uses Dictionaries, but it is too slow to my taste.

double dot_product(IDictionary<int, double> arr1, IDictionary<int, double> arr2)
  {
     double res = 0;
     double val2;
     foreach (KeyValuePair<int, double> p in arr1)
        if (arr2.TryGetValue(p.Key, out val2))
           res += p.Value * val2;
     return res;
  }

The full arrays have about 500,000 entries each, while the sparse ones are only tens to hundreds entries each.

I did some experiments with toy versions of dot products. First I tried to multiply just two double arrays to see the ultimate speed I can get (let's call this "flat").

Then I tried to change the implementation of the associative array multiplication using an int[] ID array and a double[] values array, walking together on both ID arrays and multiplying when they are equal (let's call this "double").

I then tried to run all three versions with debug or release, with F5 or Ctrl-F5. The results are as follows:

debug F5:    dict: 5.29s double: 4.18s (79% of dict) flat: 0.99s (19% of dict, 24% of double)
debug ^F5:   dict: 5.23s double: 4.19s (80% of dict) flat: 0.98s (19% of dict, 23% of double)
release F5:  dict: 5.29s double: 3.08s (58% of dict) flat: 0.81s (15% of dict, 26% of double)
release ^F5: dict: 4.62s double: 1.22s (26% of dict) flat: 0.29s ( 6% of dict, 24% of double)

I don't understand these results.
Why isn't the dictionary version optimized in release F5 as do the double and flat versions?
Why is it only slightly optimized in the release ^F5 version while the other two are heavily optimized?

Also, since converting my code into the "double" scheme would mean lots of work - do you have any suggestions how to optimize the dictionary one?

Thanks!
Haggai

A: 

I think you can't do any more optimisation to your dot_product function. You have to run down one dictionary and check if the second one contains any of these IDs. Maybe you could implement some check which dictionary is smaller in size and perform the for each on this one. This could give you some additional performance if the size of both can vary in big numbers (e.g. arr1 = 500,000 and arr2 = 1,000).

But if you think this is still too slow, then the performance impact doesn't maybe come from this function. Maybe the bigger problem is the creation and filling of the dictionaries. So maybe you can go better by using your simple array methods. But this depends on how often you have to create the necessary structures for your function. Do you have to create these dictionaries from scratch each time you need them or will they created and filled at startup and any changes afterwards will be reflected directly into these structures?

To get a good answer to your question (from yourself) you should not only check your algorithm (which seems quite fast to me), but also how much time is needed to create and maintain the necessary infrastructure for this function and how high are the costs for these?

Update

After reading your comment, i can't really see why this method is so slow (without using a profiler ;-)). NOrmally a TryGetValue should perform somewhere around O(1), the calculation itself isn't that hard either. So the only thing would be to optimise the foreach run. But due to the fact, that someone has to iterate over all items, you can only make this only a little shorter be selecting the shortest of the two for this step (like already mentioned).

Expect from this i can't see anything more you can do.

Oliver
Thanks.I did some profiling, and this method (alongside another very similar one) is the culprit of the low performance of my program.The dictionaries are populated in advance, and even in my above mentioned experiments I've measured only the execution time of the multiplication, without the data population time.The multiplied arrays are roughly the same size, but I tried your suggestion anyway - there's no change in execution time.
Haggai
+2  A: 

I recommend using SortedList<int, double> instead of the Dictionary. Instead of running TryGetValue repeatedly, you can now create two separate Enumerators and walk each list in parallel. Always move forward with whichever list is 'behind' in enumeration and any time you see two enumerated elements equal, you've found a match. Don't have my IDE handy at the moment, but pseudo-code is like this:

Get enumerator for vector A
Get enumerator for vector B
while neither enumerator is at the end
   if index(A) == index(B) then
     this element is included in dot product
     move forward in A and B
     continue next loop iteration

   if index(A) < index(B)
     move forward in A
   else
     move forward in B
continue while loop
Dan Bryant
Another possibility, if the sparse arrays are clustered (i.e. the index values tend to be close together, but may start at any arbitrary location) is to store a 'flat' array and starting index. You can then dot the arrays over the intersection of the two index sets. Depending on your problem domain, there may be index transformations that yield this sort of clustering (for example, a matrix with entries clustered near the diagonal can be re-indexed to place diagonal and near-diagonal elements close to each other.)
Dan Bryant
Thanks. The SortedList is actually an implementation of my "double" method, and I tested it now and it didn't improve running times considerably.
Haggai
Another possibility, if your target machine has multiple cores (most have at least hyperthreading these days) is to compute the dot product in parallel. If you can use .NET 4, there are extensions that make this much easier. There is overhead associated with this, but it might still be faster for your reasonably large sets. I suspect you are being limited by memory cache misses rather than CPU cycles, but it may be worth trying.
Dan Bryant
I'll certainly look into it in the future, thanks.
Haggai
A: 

Thanks all. I've decided on converting the code to using parallel walk on sorted arrays (the "double" method), which with the correct wrapper didn't take as much time to convert as I feared it would. Apparently the JIT/compiler optimizations don't work so well with generics as they do with arrays.

Haggai
I proposed a solution as well, just had to sleep on it first ;)
Mikael Svenson
A: 

You could try this which is pretty fast. Define a struct like:

public struct MyDoubles
{
    public Double Val1 { get; set; }
    public Double Val2 { get; set; }
    public Double Product()
    {
        return Val1 * Val2;
    }
}

And define an array as long as the biggest id.

MyDoubles[] values = new MyDoubles[1000000];

Then populate Val1 with values from array1 and Val2 with values from array2 using the id as the index position.

Then loop over and calculate:

public double DotProduct2(MyDoubles[] values)
{
    double res = 0;
    for (int i = 0; i < values.Length; i++)
    {
        res += values[i].Product();
    }
    return res;
}

Depending on your largest id, you might have a memory issue, and there's the matter of setting up the data structure as well.

My timings for calculations with the dictionary version vs. my proposed array/struct version yields these numbers:

Dict: 5.38s
Array: 1.87s

[Update with release build]

Dict: 4.70s
Array: 0.38s
Mikael Svenson