Hi,
I would like to make a nice function to aggregate data among an array (it's a numpy record array, but it does not change anything)
you have an array of data that you want to aggregate among one axis: for example an array of dtype=[(name, (np.str_,8), (job, (np.str_,8), (income, np.uint32)]
and you want to have the mean income per job
I did this function, and in the example it should be called as aggregate(data,'job','income',mean)
def aggregate(data, key, value, func):
data_per_key = {}
for k,v in zip(data[key], data[value]):
if k not in data_per_key.keys():
data_per_key[k]=[]
data_per_key[k].append(v)
return [(k,func(data_per_key[k])) for k in data_per_key.keys()]
the problem is that I find it not very nice I would like to have it in one line: do you have any ideas?
Thanks for your answer Louis
PS: I would like to keep the func in the call so that you can also ask for median, minimum...