tags:

views:

87

answers:

2

I am just a beginner in python. Recently i am learning to use dictionaries but my knowledge in it is still limited. I have this idea popping out from my head but i am not sure whether it is workable in python.

I have 3 document looks like this:

DOCNO= 5

nanofluids  :0.6841

introduction:0.2525

module      :0.0000

to          :0.0000

learning    :0.0000





DOCID= 1

nanofluids  :0.0000

introduction:0.2372

module      :0.0000

to          :0.0000

learning    :0.1185



DOCNO= 12

nanofluids  :0.0000

introduction:0.0000

module      :0.5647

to          :0.0000

learning    :0.2084

I know how to store a single value in dictionary. For example:

data={5: 0.67884, 1:0.1567, 12:3455}

But what i want to do now is storing an array with corresponding document number which looks like:


import array

data={ 5:array([0.6841,0.2525,0.0000.0000,0.0000]), 1:array([0.0000,0.2372,0.0000,0.0000,0.1185]), 12:array([0.0000,0.0000,0.5647,0.0000,0.2084])} 

* My python v2.6.5 doesn't seem to let me do this.*


If assume that the above operation works, i want to perform dot product or matrix product to find the similarity between pairs of documents. My idea is to arrange the array in 3x5 matrix and multiply by its transpose which is 5x3. This will return a 3x3 matrix which tells me the relationship between two documents. for example:

[ 5:[0.6841,0.2525,0.0000,0.0000,0.0000],

1:[0.0000, 0.2372,0.0000,0.0000,0.1185],

12:[0.0000,0.0000,0.5647,0.0000,0.2084] ]

and multiply by its transpose( i am not sure how to do that) and the result will be 3x3 matrix that corresponded to "DOCNO" by "DOCNO".

Bottom line is i need to be able to retrieve the DOCNO. For example (5,1) shows the relationship between document 5 and 1. Or ( 1,12) shows the relationship between document 1 and 12. I am not sure whether this is possible in python but other similar resolution will be appreciated. Thanks for your time.

+3  A: 

First, you should look at the Python documentation for arrays. There are three things wrong with your sample code:

  • You've imported the array module, but not the array class. Try this:

    from array import array

  • You've got 0.0000.0000 as a float in your list.

  • array takes two arguments; a typecode and the initialization values. Change your array([...]) calls to array('f', [...]) calls, and it should work.

But truth be told, Python doesn't have many basic tools for this built in (you can always write your own). If you're doing matrix algebra you should probably use NumPy.

It can handle both arrays and matrices, along with all the relevant transforms.

Chris B.
A: 

To fix your data assigment try something like this:

from array import array

data={ 5:array('d',[0.6841,0.2525,0.0000,0.0000,0.0000]), 1:array('d',[0.0000,0.2372,0.0000,0.0000,0.1185]), 12:array('d',[0.0000,0.0000,0.5647,0.0000,0.2084])}

That way or another I would use NumPy for rest of calculations.

Zuljin