tags:

views:

281

answers:

4

In statistical language R, mean() and median() are standard functions which do what you'd expect. mode() tells you the internal storage mode of the R object, not the value that occurs the most in its argument. But surely there is a standard library function that implements mode for a vector (or list).

+1  A: 

found this on the r mailing list, hope it's helpful. It is also what I was thinking anyways. You'll want to table() the data, sort and then pick the first name. It's hackish but should work.

names(sort(-table(x)))[1]
Dan
That's a clever work around as well. It has a few drawbacks: the sort algorithm can be more space and time consuming than max() based approaches (=> to be avoided for bigger sample lists). Also the ouput is of mode (pardon the pun/ambiguity) "character" not "numeric". And, of course, the need to test for multi-modal distribution would typically require the storing of the sorted table to avoid crunching it anew.
mjv
A: 

R has so many add-on packages that some of them may well provide the [statistical] mode of a numeric list/series/vector.

However the standard library of R itself doesn't seem to have such a built-in method! One way to work around this is to use some construct like the following (and to turn this to a function if you use often...):

mySamples <- c(19, 4, 5, 7, 29, 19, 29, 13, 25, 19)
tabSmpl<-tabulate(mySamples)
SmplMode<-which(tabSmpl== max(tabSmpl))
if(sum(tabSmpl == max(tabSmpl))>1) SmplMode<-NA
> SmplMode
[1] 19

For bigger sample list, one should consider using a temporary variable for the max(tabSmpl) value (I don't know that R would automatically optimize this)

Reference: see "How about median and mode?" in this KickStarting R lesson
This seems to confirm that (at least as of the writing of this lesson) there isn't a mode function in R (well... mode() as you found out is used for asserting the type of variables).

mjv
+3  A: 

There is package modeest which provide estimators of the mode of univariate unimodal (and sometimes multimodal) data and values of the modes of usual probability distributions.

mySamples <- c(19, 4, 5, 7, 29, 19, 29, 13, 25, 19)

library(modeest)
mlv(mySamples, method = "mfv")

Mode (most likely value): 19 
Bickel's modal skewness: -0.1 
Call: mlv.default(x = mySamples, method = "mfv")

For more information see this page

gd047
+1  A: 

Here, another solution:

freq <- tapply(mySamples,mySamples,length)
#or freq <- table(mySamples)
as.numeric(names(freq)[which.max(freq)])
teucer
You can replace the first line with table.
Jonathan Chang
I was thinking that 'tapply' is more efficient than 'table', but they both use a for loop. I think the solution with table is equivalent. I update the answer.
teucer