The real goal here is to find the quantile means (or sums, or median, etc.) in Python. Since I'm not a power user of Python but have used R for a while, my chosen route is via Rpy. However, I ran into the problem that the returned list of means are not correspondent to the order of the quantiles. In particular, I have the followings in R:
> a = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> b = c(2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000)
> prob = seq(0,5)/5
> br = quantile(a,prob)
> rcut = cut(a, br, include.lowest = TRUE)
> quintile_means = tapply(b, rcut, mean)
> quintile_means
[1,2.8] (2.8,4.6] (4.6,6.4] (6.4,8.2] (8.2,10]
3 30 300 3000 30000
which is all very good. However, if I translate the code into Rpy, I got
>>> import rpy
>>> from rpy import r
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> b = [2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000]
>>> prob = [ x / 5.0 for x in range(6)]
>>> br = r.quantile(a, prob)
>>> rcut = r.cut(a, br, include_lowest=r.TRUE)
>>> quintile_means = r.tapply(b, rcut, r.mean)
>>> print quintile_means
[30.0, 300.0, 3000.0, 30000.0, 3.0]
Note the final list is mis-ordered (we know it because a
and b
are both ordered in this case). In general, I just have no way to recover the correct order from the lowest to highest quantile in Rpy. Any suggestions?
In addition (not in substitution, as I'd like to know the answer to the above question), if you can suggest a way to directly perform the analysis in python, that will be great too. (I don't have numpy or scipy installed.) Thx!
EDIT: To clarify, a
and b
are paired but not necessarily ordered. For example, a
is the size of eyes and b
is the size of nose. I'm trying to find out that in the various quantiles of a
, what are the means of the correspondent b
s. Thanks.