views:

132

answers:

2

Hi together,

I have a scipy.sparse.dok_matrix (dimensions m x n), wanting to add a flat numpy-array with length m.

for col in xrange(n):
    dense_array = ...
    dok_matrix[:,col] = dense_array

However, this code raises an Exception in dok_matrix.__setitem__ when it tries to delete a non existing key (del self[(i,j)]).

So, for now I am doing this the unelegant way:

for col in xrange(n):
    dense_array = ...
    for row in dense_array.nonzero():
        dok_matrix[row, col] = dense_array[row]

This feels very ineffecient. So, what is the most efficient way of doing this?

Thanks!

+2  A: 

I'm surprised that your unelegant way doesn't have the same problems as the slice way. This looks like a bug to me upon looking at the Scipy code. When you try to set a certain row and column in a dok_matrix to zero when it is already zero, there is be an error because it tries to delete the value at that row and column without checking if it exists.

In answer to your question, what you are doing in your inelegant way is exactly what the __setitem__ method does currently with your elegant method (after a couple of isinstance checks and what not). If you want to use the elegant way, you can fix the bug I mentioned in your own Scipy package by opening up dok.py in Lib/site-packages/scipy/sparse/ and changing line 222 from

if value==0:

to

if value==0 and self.has_key((i,j)):

Then you can use the elegant way and it should work just fine. I went to submit a bug fix, but this it is already fixed for the next version and this is the way that it was fixed.

Justin Peel
The "unelegant" way filters out the zeros with `dense_array.nonzeros()` before inserting values in dok_matrix, that's why it doesn't crash.Thanks a lot!
PhilS
Doh, missed that, but glad I could help.
Justin Peel
side-note: I think the code shown above in `dok.py` is buggy, as zero values are set in `dok_matrix` if the key (i,j) does not exist yet. I opened a ticket (http://projects.scipy.org/scipy/ticket/1160).Further, my "unelegant" way performs much better if `dense_array` is sparse, as only non-zero values have to be checked and inserted (`__setitem__` is only called for them). So I stick to my old version, although the one you mentioned is more beautiful...
PhilS
Yes, you're very right about zero values being set. I originally had the check for self.has_key as a separate inside if which is I guess the way it should be to avoid that problem. While I am glad that the sparse matrices exist in Scipy, they have quite a ways to go in development in my opinion.
Justin Peel
A: 

I think that this bug has been fixed in Scipy 0.8.0

Pablo Antolin
Yes, that's true.
PhilS