views:

47

answers:

1

We've got a set of recarrays of data for individual days - the first attribute is a timestamp and the rest are values.

Several of these:

    ts             a    b    c
2010-08-06 08:00, 1.2, 3.4, 5.6
2010-08-06 08:05, 1.2, 3.4, 5.6
2010-08-06 08:10, 1.2, 3.4, 5.6
2010-08-06 08:15, 2.2, 3.3, 5.6
2010-08-06 08:20, 1.2, 3.4, 5.6

We'd like to produce an array of the averages of each of the values (as if you laid all of the day data on top of each other, and averaged all of the values that line up). The timestamp times all match up, so we can do it by creating a result recarray with the timestamps, and the other columns all 0s, then doing something like:

for day in day_data:
    result.a += day.a
    result.b += day.b
    result.c += day.c

result.a /= len(day_data)
result.b /= len(day_data)
result.c /= len(day_data)

It seems like a better way would be to convert each day to a 2d array with just the numbers (lopping off the timestamps), then average them all element-wise in one operation, but we can't find a way to do this - it's always a 1d array of objects.

Does anyone know how to do this?

+1  A: 

There are several ways to do this. If your data (at least in the columns you're interested in viewing as a 2D array) are all of exactly the same dtype, you can do this:

# Assuming all data in the columns a,b,c are np.floats...
# Note that this doesn't create a copy!!
new_data = data[['a','b','c']].view(np.float).reshape((data.size, 3))

Alternatively, if your columns are in similar but different dtypes (e.g. mix of float32's, int32's, and float64's, but no string arrays) and you want to cast them all to the same type:

# Creates a copy and recasts data to a consistent datatype
new_data = np.vstack([data[item] for item in ['a','b','c']]).T

Also note that it might be a good idea to look into either scikits.timeseries or using a numerical datestamp (e.g. matplotlib.dates.date2num) so that you can easily index your array by date ranges.

Joe Kington
That's great, thanks! I'm still struggling to get used to doing things on the arrays as a whole - my instinct is to do things to elements individually. One note from my testing - while the .view(np.float) part doesn't make a copy, the fancy slicing does.
wilberforce