views:

115

answers:

2

Hello. I am trying to compute a definite double integral using scipy. The integrand is a bit complicated, as it contains some probability distributions to give weight to how likely is each value of x and y (like a mixture model). The following code evaluated to a negative number, but it should be bound by [0,1]. Additionally, it took about half an hour to compute.

I have two questions.

1) Is there a better way to calculate this integral?

2) Where is this negative value coming from? The big question for me is how to speed the calculation up, as I can find the bug in my code that's leading to the negative later on my own.

from scipy import stats
from scipy.integrate import dblquad
import itertools

p= [list whose entries are each different stats.beta(a,b) distributions]

def integrand(x,y):
        delta=x-y
        marg=0
        for distA,distB in itertools.permutations(p,2):
                first=distA.pdf(x)
                second=distB.pdf(y)
                weight1=0
                weight2=0
                for distC in p:
                        if distC == distA:
                                continue
                        w1=distC.cdf(x)-distC.cdf(y)
                        if weight1 == 0:
                                weight1=w1
                        else:
                                weight1=weight1*w1
                marg+=(first*weight1*second)
        I=delta*marg
        return I

expect=dblquad(integrand,0,1,lambda x: 0, lambda x: x)

This is asking essentially what for the expected value of the maximal distance between two points is in a vector of distributions. The limits of integration are y ∊ [0,x] and x ∊ [0,1]. This gave me about -.49, with an estimated error of the integral on the order of 10e-10, so it shouldn't be due to the integration method.

I've been fighting with this for a while and appreciate any help. Thanks.

edit: corrected typo

A: 

The error given by the integration method is just a number telling you how well the convergence behaviour is. Have you tried to calculate explicit values of the integrand?

By the way: Are you integrating pdf's? If yes: Are you sure about your integration limits?

Philip
@user485185: Yes, the integrand contains pdfs. In words, it is (X - Y) * P(X - Y). P(X - Y) is the probability of calculating x - y and is as follows: For a given pair of distributions, weight the probability of that distribution giving you the value x or y (depending on which variable you are looking at at that moment; evaluates as P_i(x)) with the probability that it is the min or max of the set of distributions (or else you wouldn't be calculating the maximal distance using that particular distribution; evaluates as CDF_i(x) - CDF_i(y)). This I integrate over x=[0,1] and y=[0,x].
Jason