I may be misunderstanding, but I think it can be done this way:
> years = c(2006, 2006, 2006, 2006, 2001, 2001, 2001, 2001, 2001)
> scores = c(13, 65, 23, 34, 78, 56, 89, 98, 100)
> tapply(scores, years, quantile)
$`2001`
0% 25% 50% 75% 100%
56 78 89 98 100
$`2006`
0% 25% 50% 75% 100%
13.00 20.50 28.50 41.75 65.00
Is this right?
I mean the actual percentile of each
observation. – Ryan Rosario
Edit:
I think this may do it then:
> tapply(scores, years, function(x) { f = ecdf(x); sapply(x, f) })
$`2001`
[1] 0.4 0.2 0.6 0.8 1.0
$`2006`
[1] 0.25 1.00 0.50 0.75
With your data:
> tapply(scores, years, function(x) { f = ecdf(x); sapply(x, f) })
$`2000`
[1] 0.3333333 0.6666667 1.0000000
$`2008`
[1] 0.5 1.0
Edit 2:
This is probably faster:
tapply(scores, years, function(x) { f = ecdf(x); f(x) })
f()
is vectorized :-)
Last, modification, I promise :-). If you want names:
> tapply(scores, years, function(x) { f = ecdf(x); r = f(x); names(r) <- x; r })
$`2000`
1000 1700 2000
0.3333333 0.6666667 1.0000000
$`2008`
1500 2000
0.5 1.0