views:

112

answers:

3

*edited 6/17/10

I'm trying to understand how to improve my code (make it more pythonic). Also, I'm interested in writing more intuitive 'conditionals' that would describe scenarios that are commonplace in biochemistry. The conditional criteria in the below program I've explained in Answer #2, but I am not satisfied with the code- it works fine, but isn't obvious and isn't easy to implement for more complicated conditional scenarios. Ideas welcome. Comments/criticisms welcome. First posting experience @ stackoverflow- please comment on etiquette if needed.

The code generates a list of values that are the solution to the following exercise:

"In a programming language of your choice, implement Gillespie’s First Reaction Algorithm to study the temporal behaviour of the reaction A--->B in which the transition from A to B can only take place if another compound, C, is present, and where C dynamically interconverts with D, as modelled in the Petri-net below. Assume that there are 100 molecules of A, 1 of C, and no B or D present at the start of the reaction. Set kAB to 0.1 s-1 and both kCD and kDC to 1.0 s-1. Simulate the behaviour of the system over 100 s."

def sim():
    # Set the rate constants for all transitions
    kAB = 0.1
    kCD = 1.0
    kDC = 1.0

    # Set up the initial state
    A = 100
    B = 0
    C = 1
    D = 0

    # Set the start and end times
    t = 0.0
    tEnd = 100.0

    print "Time\t", "Transition\t", "A\t", "B\t", "C\t", "D"

    # Compute the first interval
    transition, interval = transitionData(A, B, C, D, kAB, kCD, kDC)
    # Loop until the end time is exceded or no transition can fire any more
    while t <= tEnd and transition >= 0:
        print t, '\t', transition, '\t', A, '\t', B, '\t', C, '\t', D
        t += interval
        if transition == 0:
            A -= 1
            B += 1
        if transition == 1:
            C -= 1
            D += 1
        if transition == 2:
            C += 1
            D -= 1
        transition, interval = transitionData(A, B, C, D, kAB, kCD, kDC)


def transitionData(A, B, C, D, kAB, kCD, kDC):
    """ Returns nTransition, the number of the firing transition (0: A->B,
    1: C->D, 2: D->C), and interval, the interval between the time of
    the previous transition and that of the current one. """
    RAB = kAB * A * C
    RCD = kCD * C
    RDC = kDC * D
    dt = [-1.0, -1.0, -1.0]
    if RAB > 0.0:
        dt[0] = -math.log(1.0 - random.random())/RAB
    if RCD > 0.0:
        dt[1] = -math.log(1.0 - random.random())/RCD
    if RDC > 0.0:
        dt[2] = -math.log(1.0 - random.random())/RDC
    interval = 1e36
    transition = -1
    for n in range(len(dt)):
        if dt[n] > 0.0 and dt[n] < interval:
            interval = dt[n]
            transition = n
    return transition, interval       


if __name__ == '__main__':
    sim()
A: 

I don't know the Gillespie algorithm, but I assume that you have checked that the program converges to the correct equilibrium. Therefore I interpret you questions as

"Here is a working physics program, how can I make it more pythonic"

It would probably be more pythonic to do something like the following

R = [ kAB * A * C,  kCD * C, kAB * A * C]
dt = [(-math.log(1-random.random())/x,i) for i,x in enumerate(R) if x > 0]
if dt:
     interval,transition = min(dt)
else:
     transition = None

If you want to use python in physics, then I suggest that you learn numpy. Because numpy is faster for many problems. So here are some untested parts of a numpy solution. Add the following to the header of you program

from numpy import log, array, isinf, zeros
from numpy.random import rand

Then you can replace the inside TransitionData with something like the following

R = array([ kAB * A * C,  kCD * C, kAB * A * C])
dt = -log(1-rand(3))/R
transition = dt.argmin()
interval = dt[transition]
if isinf(interval):
    transition = None

I don't know if it would be more pythonic to raise a StopIteration exception instead of returning None, but that is a detail.

You should also store your concentrations in a single data structure. If you use numpy, then I suggest that you use an array. Similarly yoy can use an array dABCD to store the changes in the concentration (You can probably come up with better variable names). Add the following code outside your loop

ABCD = array([A,B,C,D])

dABCD = zeros(3,4)
dABCD[0,0] = -1#A decreases when transition == 0
dABCD[0,1] = 1 #B increases when transition == 0
dABCD[1,2] = -1#C decreases when transition == 1
dABCD[1,3] = 1 #D increases when transition == 1
.....      etc

Now you can replace you main loop with something like the following

while t <= tEnd:
       print t, '\t', transition, '\t', ABCD
       transition, interval = transitionData(ABCD, kAB, kCD, kDC)
       if transition != None:
            t += interval
           ABCD += dABCD[transition,:]
       else:
           break;
else:
       print "Warning: Stopping due to timeout. The system did not equilibrate"

There is probably more to do. As an example dABCD should probably be a sparse array, but I hope that these ideas can be a start.

nielsle
nielsle, you are the man. I really appreciate the input. I'm in the process of rewriting w/ numpy and matplotlib. Please check back tomorrow, as I'm sure I'll have some questions- they should be concise and answerable, scout's honor. Justin Peel, my thoughts on the monte carlo bit have been appended as an additional answer below (please comment, criticism is welcome!) Also, the 3rd 'answer' is some info on what the equations mean in the context of biophysics. Might be a little boring for most of you guys.
DocDubya
A: 

*Edit* I originally explained this wrong!!!! The following is correct though- Justin, this program uses a clever criteria to ‘weight’ each event. The RAB, RCD, and RDC values are all given a true/false parameter by multiplying kAB, kCD, and kDC by C or D, which in this case can be either one or zero. A zero value for D, and thus RDC would prevent dt[2] from being drawn in the

for n in range(len(dt)): if dt[n] > 0.0 and dt[n] < interval:

statement. Furthermore, the following-

    if transition == 1:
        C -= 1
        D += 1
    if transition == 2:
        C += 1
        D -= 1

dictates that when the event C->D occurs (transition 1), the next event necessarily must be D->C (transition 2), since of the three values in dt[], only dt[1] is nonzero and thus meets the aforementioned criteria. So, how are we weighting the likelihood that transition 0 or transition 1 occur? It's a little tricky, but is inherent in the following lines:

interval = 1e36
transition = -1
for n in range(len(dt)):  
    if dt[n] > 0.0 and dt[n] < interval:            
        interval = dt[n]            
        transition = n            
return transition, interval   

"for n in range (len(dt)):" returns all values of the list dt[]. The next line specifies the criteria that must be met, namely that each value has to be greater than 0 and less than interval. For transition 0, interval is 1e36 (which is supposed to be representative of infinity). The rub is that interval is then set to transition 0, so for the second value in dt[], transition 1, the criteria states that it must be smaller than the dt for transition 0 in order to occur... or in other words that it must have happened faster to have happened at all, which agrees with chemical logic. My biggest concern is that the accumulated t values mandated by the "t += interval" line might not be entirely fair... because since t1 firing is independent of t0 firing, t0 firing and taking say, .1 sec, shouldn't exclude t1 from using the same .1 sec to fire... but I'm working on a fix for this... suggestions welcome! This is a verbose print out from the script, including a firing of transition 1 and 2:

Time Transition A B C D

dt0= 0.0350693547214
dt1= 2.26710773787
interval= 1e+36
dt= 0.0350693547214
transition= 0

0.0 0 100 0 1 0

dt0= 0.000339596342313
dt1= 0.21083283004
interval= 1e+36
dt= 0.000339596342313 
transition= 0

0.0350693547214 0 99 1 1 0

dt0= 0.0510125874767
dt1= 1.26127048627
interval= 1e+36 
dt= 0.0510125874767
transition= 0

0.0354089510637 0 98 2 1 0

dt0= 0.0809691957218
dt1= 0.593246425076
interval= 1e+36
dt= 0.0809691957218
transition= 0

0.0864215385404 0 97 3 1 0

dt0= 0.00205040633531
dt1= 1.70623338677
interval= 1e+36
dt= 0.00205040633531
transition= 0

0.167390734262 0 96 4 1 0

dt0= 0.106140534256
dt1= 0.0915160378053
interval= 1e+36
dt= 0.106140534256
transition= 0
interval= 0.106140534256
dt= 0.0915160378053
transition= 1

0.169441140598 1 95 5 1 0

dt2= 0.0482892532952
interval= 1e+36
dt= 0.0482892532952
transition= 2

0.260957178403 2 95 5 0 1

dt0= 0.112545351421
dt1= 1.84936696832
interval= 1e+36
dt= 0.112545351421
transition= 0

0.309246431698 0 95 5 1 0

Justin, I'm not sure what you mean by dt[2] being less than 1e36 making it 'stay' on transition 2? That doesn't happen because of the

if transition == 2:
            C += 1
            D -= 1

statement. Anyone know of a more direct way to accomplish this

Haha, let the flaming begin- you guys are awesome though-I really appreciate the feedback! Stackoverflow is soooooo legit.

DocDubya
A: 

Info on the math behind simple stochastic simulation of chemical rxns:

Typically, processes like this are simulated as discrete events with each event occurring with probability 'P' given a specific rate constant 'k' and a number of possible events 'n' in the interval of time 'dt': P=1-e**(-k*dt*n). Here we are neglecting the actual time of each event (~0) and focusing instead on the interval of time in which the event occurs. Anyone familiar with N choose K problems/bernouli trials will appreciate the presence of 1/e e.g. when N=K and N->oo, the probability of not choosing a specific element from N approaches 1/e. Hence, in a stochastic chemical reaction (first order), the probability that a molecule will not undergo reaction (not be chosen) is some power of 1/e... that power dependent on the time interval and rate constant as well as the number of molecules and rate constant in question. Conversely, 1-(1/e)^xyz gives the probability that any specific molecule will react (be chosen).

In terms of simulation, it would be logical to divide our total time interval into ever smaller intervals and use a random number generator to predict whether an event happened in a given time interval- e.g. if we divided the dt for a single even into 10 smaller intervals, a number between 0 and 0.1 would indicate an event occurred, while a number between .1 and 1.0 would indicate it did not. There is however uncertainty as to exactly when the event occurred- so we must make our intervals smaller- this quickly becomes a loosing battle as uncertainty persists with this method.

The solution to this problem is to take the natural log (‘ln’ here, log() by default in py) of both sides of the above equation and solving for dt, which gives dt= (-ln(1-P))/(k*n). The probability P is then randomly generated, giving a definitive dt for each event.

DocDubya