views:

217

answers:

2

The following code runs too slowly even though everything seems to be vectorized.

from numpy import *
from scipy.sparse import *

n = 100000;
i = xrange(n); j = xrange(n);
data = ones(n);

A=csr_matrix((data,(i,j)));

x = A[i,j]

The problem seems to be that the indexing operation is implemented as a python function, and invoking A[i,j] results in the following profiling output

         500033 function calls in 8.718 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   100000    7.933    0.000    8.156    0.000 csr.py:265(_get_single_element)
        1    0.271    0.271    8.705    8.705 csr.py:177(__getitem__)
(...)

Namely, the python function _get_single_element gets called 100000 times which is really inefficient. Why isn't this implemented in pure C? Does anybody know of a way of getting around this limitation, and speeding up the above code? Should I be using a different sparse matrix type?

+3  A: 

You can use A.diagonal() to retrieve the diagonal much more quickly (0.0009 seconds vs. 3.8 seconds on my machine) . However, if you want to do arbitary indexing then that is a more complicated question because you aren't using slices so much as a list of indices. The _get_single_element function is being called 100000 times because it just iterating through the iterators (i and j) that you passed to it. A slice would be A[30:60,10] or something similar to that.

Also, I would use csr_matrix(eye(n,n)) to make the same matrix that you made with iterators just for simplicity.

Update:

Ok, since your question is truly about being able to access lots of random elements quickly, I will answer your questions as best as I can.

  • Why isn't this implemented in pure C?

The answer is simple: no one has gotten around to it. There is still plenty of work to be done in the sparse matrix modules area of Scipy from what I have seen. One part that is implemented in C is the conversions between different sparse matrix formats.

  • Does anybody know of a way of getting around this limitation, and speeding up the above code?

You can try actually diving into the sparse matrix modules and trying to speed them up. I did so and was able to get the time down to less than a third of the original when trying out your code above for random accesses using csr matrices. I had to directly access _get_single_element and pare down the code significantly to do that including taking out bound checks.

However, it was faster still to use a lil_matrix (though slower to initialize the matrix), but I had to do the accessing with a list comprehension because lil matrices aren't setup for the type of indexing you are doing. Using a list comprehension for the csr_matrix still leaves the lil matrix method way ahead by the way. Ultimately, the lil matrix is faster for accessing random elements because it isn't compressed.

Using the lil_matrix in its original form runs in about a fifth of the time of the code you have listed above. If I take out some bound checks and call lil_matrix's _get1() method directly, I can bring the time down further about 7% of the original time. For clarity that's a speed-up from 3.4-3.8 seconds to about 0.261 seconds.

Lastly, I tried making my own function that directly accesses the lil matrix's data and avoids the repeated function calls. The time for this was about 0.136 seconds. This didn't take advantage of the data being sorted which is another potential optimization (in particular if you are accessing a lot of elements that are on the same rows).

If you want faster than that then you will have to write your own C code sparse matrix implementation probably.

  • Should I be using a different sparse matrix type?

Well, I suggest the lil matrix if your intent is to access a whole lot of elements, but it all depends on what you need to do. Do you also need to multiply matrices for instance? Just remember that changing between matrices can at least sometimes (in certain circumstances) be quite fast so don't rule out changing to a different matrix format to do different operations.

If you don't need to do actually do any algebra operations on your matrix, then maybe you should just use a defaultdict or something similar. The danger with defaultdicts is that whenever an element is asked for that isn't in the dict, it sets that item to the default and stores it so that could be problematic.

Justin Peel
I used a diagonal matrix just for demonstration purposes. In my case the indices `i,j` are not along the diagonal, and the matrix is not necessarily diagonal.
celil
A: 

I think _get_single_element is only called when the default dtype of 'object' is used. Have you tried providing a dtype, such as csr_matrix((data, (i,j)), dtype=int32)

Shane Holloway
Just so you know, I gave this a shot and it didn't seem to make any difference. It was worth trying though.
Justin Peel