ansaurus

Question

Answer 1

A:

what about something like this, keep storing values till time difference with last time is > 100, average and yield such values e.g.

def getAvgValues(data):
    lastTime = 0
    prevValues = []
    avgSampleTime=100

    for t, v in data:
        if t - lastTime < avgSampleTime:
            prevValues.append(v)
        else:
            avgV = sum(prevValues)/len(prevValues)
            lastTime = t
            prevValues = [v]
            yield (t,avgV)

for v in getAvgValues(data):
    print v

Anurag Uniyal 2009-06-21 11:50:55

he asked for averages of previous 100 seconds for all the times of original measurements. you give only 2 results for his example

yairchu 2009-06-21 12:41:08

hmm i misunderstood it, any way looks like you modified it for correct solution

Anurag Uniyal 2009-06-21 13:25:01

I didn't modify it really. I just used your variable names.

yairchu 2009-06-21 14:48:25

Answer 2

+2 A:

You haven't said exactly when you want output. I'm assuming that you want one smoothed reading for each raw reading, and the smoothed reading is to be the arithmetic mean of the raw reading and any others in the previous 100 (delta) seconds.

Short answer: use a collections.deque ... it will never hold more than "delta" seconds of readings. The way I've set it up you can treat the deque just like a list, and easily calculate the mean or some fancy gizmoid that gives more weight to recent readings.

Long answer:

>>> the_data = [tuple(map(float, x.split())) for x in """\
... 6       0.738158581
... 21      0.801697222
[snip]
... 251     8.76853608
... 278     9.092367123""".splitlines()]
>>> import collections
>>> delta = 100.0
>>> q = collections.deque()
>>> for t, v in the_data:
...     while q and q[0][0] <= t - delta:
...         # jettison outdated readings
...         _unused = q.popleft()
...     q.append((t, v))
...     count = len(q)
...     print t, sum(item[1] for item in q) / count, count
...
...
6.0 0.738158581 1
21.0 0.7699279015 2
39.0 1.112360133 3
49.0 1.52907127225 4
54.0 1.791208525 5
79.0 2.13137915133 6
91.0 2.49500989771 7
97.0 2.8309395405 8
100.0 3.12993279856 9
118.0 3.74976297144 9
131.0 4.41385300278 9
147.0 4.99420529389 9
157.0 5.8325615685 8
176.0 6.033109419 9
202.0 7.15545189083 6
223.0 7.4342562845 6
251.0 7.9150342134 5
278.0 8.4246097095 4
>>>

Edit

One-stop shop: get your fancy gizmoid here. Here's the code:

numerator = sum(item[1] * upsilon ** (t - item[0]) for item in q)
denominator = sum(upsilon ** (t - item[0]) for item in q)
gizmoid = numerator / denominator

where upsilon should be a little less than 1.0 (<= zero is illegal, just above zero does little smoothing, one gets you the arithmetic mean plus wasted CPU time, and greater than one gives the inverse of your purpose).

John Machin 2009-06-21 12:48:04

It seems to me a regular list would work here, using .pop(0) instead of .popleft(). What is the advantage of collections.deque?

bpowah 2009-06-21 13:42:47

popping the left of a Python list is O(N); popping the left of a deque is O(1)

John Machin 2009-06-21 13:55:34

Answer 3

A:

rix0rrr 2009-06-21 12:50:36

This time it may be linear.

Nosredna 2009-06-21 13:33:07

Answer 4

+1 A:

I'm using a sum result to which I'm adding the new members and subtracting the old ones. However in this way one may suffer accumulating floating point inaccuracies.

Therefore I implement a "Deque" with a list. And whenever my Deque reallocates to a smaller size. I recalculate the sum at the same occasion.

I'm also calculating the average up to point x including point x so there's at least one sample point to average.

def getAvgValues(data, avgSampleTime):
  lastTime = 0
  prevValsBuf = []
  prevValsStart = 0
  tot = 0
  for t, v in data:
    avgStart = t - avgSampleTime
    # remove too old values
    while prevValsStart < len(prevValsBuf):
      pt, pv = prevValsBuf[prevValsStart]
      if pt > avgStart:
        break
      tot -= pv
      prevValsStart += 1
    # add new item
    tot += v
    prevValsBuf.append((t, v))
    # yield result
    numItems = len(prevValsBuf) - prevValsStart
    yield (t, tot / numItems)
    # clean prevVals if it's time
    if prevValsStart * 2 > len(prevValsBuf):
      prevValsBuf = prevValsBuf[prevValsStart:]
      prevValsStart = 0
      # recalculate tot for not accumulating float precision error
      tot = sum(v for (t, v) in prevValsBuf)

yairchu 2009-06-21 13:17:51

(1) That's a very interesting implementation of a deque. (2) I doubt that the OP is very worried about floating point rounding errors accumulating; they'd surely be very small cuts compared with the severe changes of the smoothing ... but if he is, using a Kahan adder to maintain the running total could be considered.

John Machin 2009-06-21 14:12:01

Note that this is very computationally intensive as compared to an exponential moving average (http://stackoverflow.com/questions/1023860/exponential-moving-average-sampled-at-varying-times/1024008). Unless you particularly need the make sure that all samples within the time range contribute equally, and older ones do not contribute at all, I'd go with the much more efficient EMA.

Curt Sampson 2009-06-25 06:52:54

@Curt Sampson: the OP particularly asked for this

yairchu 2009-06-25 09:44:38

Answer 5

A:

This one makes it linear:

def process_data(datafile):
    previous_n = 0
    previous_t = 0
    for line in datafile:
        t, number = line.strip().split()
        t = int(t)
        number = float(number)
        delta_n = number - previous_n
        delta_t = t - previous_t
        n_per_t = delta_n / delta_t
        for t0 in xrange(delta_t):
            yield previous_t + t0, previous_n + (n_per_t * t0)
        previous_n = n
        previous_t = t

f = open('datafile.dat')

for sample in process_data(f):
    print sample

nosklo 2009-06-21 14:02:32

(1) The .strip() is redundant. (2) You appear to have forgotten to update previous_* each time around (3) Even then, it's not apparent what this is meant to do ... it would appear to do a linear interpolation between the previous reading and the current reading, at one second intervals -- an interesting interpretation of the OP's requirements. (3) I think you meant `for t0 in xrange(1, delta_t + 1)`

John Machin 2009-06-21 14:23:04

Answer 6

A:

Sounds like you need a simple rounding formula. To round any number to an arbitrary interval:

round(num/interval)*interval

You can substitute round with floor or ceiling for "leading up to" or "since" affects. It can work in any language, including SQL.

Brent Baisley 2009-06-21 14:13:38

Answer 7

A:

O(1) memory in case you can iterate the input more than once - you can use one iterator for the "left" and one for the "right".

def getAvgValues(makeIter, avgSampleTime):
  leftIter = makeIter()
  leftT, leftV = leftIter.next()
  tot = 0
  count = 0
  for rightT, rightV in makeIter():
    tot += rightV
    count += 1
    while leftT <= rightT - avgSampleTime:
      tot -= leftV
      count -= 1
      leftT, leftV = leftIter.next()
    yield rightT, tot / count

yairchu 2009-06-21 14:46:42

I guess the OP wants to display the smoothed value in real time ... think heart-beat monitor in the intensive care ward.

John Machin 2009-06-21 14:50:31

Answer 8

A:

While it gives an exponentially decaying average, rather than a total average, I think you may want what I called an exponential moving average with varying alpha, which is really a single-pole low-pass filter. There's now a solution to that question, and it runs in time linear to the number of data points. See if it works for you.

Curt Sampson 2009-06-23 00:51:54

ansaurus

tags:

views:

answers:

smoothing irregularly sampled time data

RESOLUTION

related questions