views:

355

answers:

1

okay folks...thanks for looking at this question. I remember doing the following below in college however I forgotten the exact solution. Any takers to steer in the right direction.

I have a time series of data (we'll use three) of N. The data series is sequential in order of time (e.g. obsOne[1] occurred along with obsTwo[1] and obsThree[1])

obsOne[47, 136, -108, -15, 22, ...], obsTwo[448, 321, 122, -207, 269, ...], obsThree[381, 283, 429, -393, 242, ...]

Step 2. from the data series I create a series of X range bins with width Z for each data series. (e.g. of observation obsOne: bin1 = [<-108, -108] bin2 = [-108, -26] bin3 = [-26, 55] ... binX = [136, > 136]

Step 3. Now create a table with all possible combinations on the data series. Thus if I had 4 bins and 3 data series all combinations would total 4x4x4 = 64 possible outcomes. (e.g. row1 = obsOne bin1 + obsTwo bin1 + obsThree bin1, row2 = obsOne bin1 + obsTwo bin1 + obsThree bin2, ... row5 = obsOne bin1 + obsTwo bin1 + obsThree binX, row6 = obsOne bin1 + obsTwo bin2 + obsThree bin1, row7 = obsOne bin1 + obsTwo bin1 + obsThree bin2, row9 = obsOne bin1 + obsTwo bin2 + obsThree binX, ...)

Step 4. I now go back to the data series and find where each row in the data series falls on on the table and count how many times an observation does so. (e.g. obsOne[2] obsTwo[2] obsThree[2] = row 30 on table, obsOne[X] obsTwo[X] obsThree[X] = row 52 on table.

Step 5. I then only take the rows on the table with positive matches, count how many observations fell on that row, dived by total number of observation in data series and that gives me my probability for that range on the observed data.

I apologize for this basic question, not a math expert. I have done this before many years ago. I forgot which method I used, it was much faster than this long (ancient "by hand") method. I wasn't using python at the time, it was some other proprietary package in c++. I'd like to see if something is out there that can solve this problem with python (now a python shop), could always extend, so it is soft constraint.

+1  A: 

Are you talking about something like this?

from __future__ import division
from collections import defaultdict

obsOne= [47, 136, -108, -15, 22, ]
obsTwo= [448, 321, 122, -207, 269, ]
obsThree= [381, 283, 429, -393, 242, ]

class BinParams( object ):
    def __init__( self, timeSeries, X ):
        self.mx= max(timeSeries )
        self.mn= min(timeSeries )
        self.Z=(self.mx-self.mn)/X
    def index( self, sample ):
        return (sample-self.mn)//self.Z

binsOne=  BinParams( obsOne, 4 )
binsTwo=  BinParams( obsTwo, 4 )
binsThree= BinParams( obsThree, 4 )

counts= defaultdict(int)
for s1, s2, s3 in zip( obsOne, obsTwo, obsThree ):
    posn= binsOne.index(s1), binsTwo.index(s2), binsThree.index(s3)
    counts[posn] += 1

for k in counts:
    print k, counts[k], counts[k]/len(counts)
S.Lott