views:

119

answers:

2

I have a conjunctive probability mass function array, with shape, for example (1,2,3,4,5,6) and I want to calculate the probability table, conditional to a value for some of the dimensions (export the cpts), for decision-making purposes.

The code I came up with at the moment is the following (the input is the dictionary "vdict" of the form {'variable_1': value_1, 'variable_2': value_2 ... } )

for i in vdict:
   dim = self.invardict.index(i) # The index of the dimension that our Variable resides in
   val = self.valdict[i][vdict[i]] # The value we want it to be
   d = d.swapaxes(0, dim)
   **d = array([d[val]])**
   d = d.swapaxes(0, dim)

...

So, what I currently do is:

  1. I translate the variables to the corresponding dimension in the cpt.
  2. I swap the zero-th axis with the axis I found before.
  3. I replace whole 0-axis with just the desired value.

I put the dimension back to its original axis.

Now, the problem is, in order to do step 2, I have (a.) to calculate a subarray and (b.) to put it in a list and translate it again to array so I'll have my new array.

Thing is, stuff in bold means that I create new objects, instead of using just the references to the old ones and this, if d is very large (which happens to me) and methods that use d are called many times (which, again, happens to me) the whole result is very slow.

So, has anyone come up with an idea that will subtitude this little piece of code and will run faster? Maybe something that will allow me to calculate the conditionals in place.

Note: I have to maintain original axis order (or at least be sure on how to update the variable to dimensions dictionaries when an axis is removed). I'd like not to resort in custom dtypes.

A: 

Where I say "stuff in bold" I mean:

d = array(d[val])
mhourdakis
no need to "answer" to clarify. Just edit your question.
Andrew Jaffe
A: 

Ok, found the answer myself after playing a little with numpy's in-place array manipulations.

Changed the last 3 lines in the loop to:

    d = conditionalize(d, dim, val)

where conditionalize is defined as:

    def conditionalize(arr, dim, val):
        arr = arr.swapaxes(dim, 0)
        shape = arr.shape[1:]       # shape of the sub-array when we omit the desired dimension.
        count = array(shape).prod() # count of elements omitted the desired dimension.
        arr = arr.reshape(array(arr.shape).prod()) # flatten the array in-place.
        arr = arr[val*count:(val+1)*count] # take the needed elements
        arr = arr.reshape((1,)+shape) # the desired sub-array shape.
        arr = arr. swapaxes(0, dim)   # fix dimensions

        return arr

That made my program's execution time reduce from 15 minutes to 6 seconds. Huge gain.

I hope this helps someone who comes across the same problem.

mhourdakis