tags:

views:

318

answers:

2

I'm trying to understand how the order() function works in R. I was under the impression that it returned a permutation of indices, which when sorted, would sort the original vector.

For instance,

> a <- c(45,50,10,96)
> order(a)
[1] 3 1 2 4

I would have expected this to return 2 3 1 4, since the list sorted would be 10 45 50 96. Can someone help me understand the return value of this function?

Thank you.

+6  A: 

This seems to explain it.

I'll bet you're thinking of rank.

duffymo
Ahh.. I see now. order() returns the indices of the vector in sorted order. Wonderful, thanks very much.
jeffshantz
A: 

To sort 1D vector or a single column of data, "order" is not a relevant concept.

"Order" is necessary to sort data two-dimensional data--i.e., multiple columns of data collected in a matrix or dataframe.

Stadium Home Week Qtr Away Off Def Result       Kicker Dist
751     Out  PHI   14   4  NYG PHI NYG   Good      D.Akers   50
491     Out   KC    9   1  OAK OAK  KC   Good S.Janikowski   32
702     Out  OAK   15   4  CLE CLE OAK   Good     P.Dawson   37
571     Out   NE    1   2  OAK OAK  NE Missed S.Janikowski   43
654     Out  NYG   11   2  PHI NYG PHI   Good      J.Feely   26
307     Out  DEN   14   2  BAL DEN BAL   Good       J.Elam   48
492     Out   KC   13   3  DEN  KC DEN   Good      L.Tynes   34
691     Out  NYJ   17   3  BUF NYJ BUF   Good     M.Nugent   25
164     Out  CHI   13   2   GB CHI  GB   Good      R.Gould   25
80      Out  BAL    1   2  IND IND BAL   Good M.Vanderjagt   20

Here is an excerpt of data for field goal attempts in the 2008 NFL season, a dataframe i've called 'fg'. We'll imagine that these 10 data points represent all of the field goals attempted in 2008. Suppose you want to know the the distance of the longest field goal attempted that year, who kicked it, and whether it was good or not; you also want to know the second-longest, as well as the third-longest, etc.; and finally you want the shortest field goal attempt How do you do that?

Well, you could just do this:

sort(fg$Dist, decreasing=T)

which returns: 50 48 43 37 34 32 26 25 25 20

That is correct, but not really useful--we don't know who the kicker was, whether the attempt was successful, etc. Of course, we need the entire dataframe sorted on the "Dist" column. that would look like this:

Stadium Home Week Qtr Away Off Def Result       Kicker Dist
751     Out  PHI   14   4  NYG PHI NYG   Good      D.Akers   50
307     Out  DEN   14   2  BAL DEN BAL   Good       J.Elam   48
571     Out   NE    1   2  OAK OAK  NE Missed S.Janikowski   43
702     Out  OAK   15   4  CLE CLE OAK   Good     P.Dawson   37
492     Out   KC   13   3  DEN  KC DEN   Good      L.Tynes   34
491     Out   KC    9   1  OAK OAK  KC   Good S.Janikowski   32
654     Out  NYG   11   2  PHI NYG PHI   Good      J.Feely   26
691     Out  NYJ   17   3  BUF NYJ BUF   Good     M.Nugent   25
164     Out  CHI   13   2   GB CHI  GB   Good      R.Gould   25
80      Out  BAL    1   2  IND IND BAL   Good M.Vanderjagt   20

This is what 'order' does. It is 'sort' for two-dimensional data.

Here's how it works. Above, 'sort' was used to sort the Dist column; to sort the entire dataframe on the Dist column, we use 'order' exactly the same way as 'sort' is used above:

ndx = order(fg$Dist, decreasing=T)

(i usually bind the array returned from 'order' to the variable 'ndx', which stands for 'index', because i am going to use it as an index array to sort.)

that was step 1, here's step 2:

'ndx', what is returned by 'sort' is then used as an index array to re-order the dataframe, 'fg':

fg_sorted = fg[ndx,]

fg_sorted is the re-ordered dataframe immediately above.

In sum, 'sort' is used to create an index array (which specifies the sort order of the column you want sorted), which then is used as an index array to re-order the dataframe (or matrix).

doug
Thanks for the detailed explanation -- makes perfect sense.
jeffshantz
-1: order makes pretty good sense for a vector. The basic property of order--that a[order(a)] is sorted--is not clearly stated.
Jyotirmoy Bhattacharya
Wrong. you need to look again--the 'basic property' is indeed shown very clearly in the two (grey-background) lines of code above. Because sorting w/ 'order' is two separate operations, i showed this using two lines of code--one creating the index vector and the second line using that index to perform the sort. The OP asked for an explanation not just a result, and i gave him one, as evidenced by the fact that he selected my answer and wrote the brief note above "Thanks [m]akes perfect sense". I even bound the final result to a variable called "fg_sorted".
doug