tags:

views:

119

answers:

3

Is there a simple way of identifying the number of times a value is in a vector or column of dataframe? I essentially want the numerical values of a histogram but I do not know how to access it.

# sample vector
a <- c(1,2,1,1,1,3,1,2,3,3)

#hist
hist(a)

Thank you.

UPDATE:

On Dirk's suggestion I am using hist. Is there a better way than than specifying the range as 1.9, 2.9 etc when I know that all my values are integers?

 hist(a, breaks=c(1,1.9,2.9,3.9,4.9,5.9,6.9,7.9,8.9,9.9), plot=FALSE)$counts
+8  A: 

Use table function.

mbq
As you were snitching about my suggestion of `hist` (even though that was what the OP asked about !!), let me mention that `table()` has a dark downside too: ever tried it with thousands of unique values in the object you're tabulating? ;-) At the end of the day, both are valuable, but for different purposes. As are `cut()`, `quantile()` etc pp
Dirk Eddelbuettel
I'm happy with any method that returns the count of the number of values, and it seems that I can control the number of breaks. However I don't understand the result from hist: e.g.hist(a, breaks=3, plot=FALSE)$counts returns 5 2 0 3
celenius
@Dirk I was not snitching; in my view using table is a generic answer and hist is an optimization for the case when the number of unique values is large; indeed fighting with bins when you have few numbers to count is not at all elegant and may be even inefficient.
mbq
@celenius This is why I prefer table.
mbq
@mbq: I didn't mean to sound so negative -- please imagine a smiley in the proper places. `table()` *is* a good answer provided a) you have only a small-ish number of different values and b) that is in fact what you want --- where celenius seems to be after `hist()`.
Dirk Eddelbuettel
@celenius: `hist()`, like many other R functions, is rich in features. If you say `breaks=3` you only say 'give me three breaks'. You could also say `breaks=seq(0,5)+0.5` to supply 0.5,1.5,2.5...,5.5 or many other forms.
Dirk Eddelbuettel
@Dirc I agree; I've even gave you a vote, still this is also a wiki-like service and some more info won't hurt it. Even our small discussion can be useful for others.
mbq
+3  A: 

Try this:

R> a <- c(1,2,1,1,1,3,1,2,3,3)
R> b <- hist(a, plot=FALSE)
R> str(b)
List of 7
 $ breaks     : num [1:5] 1 1.5 2 2.5 3
 $ counts     : int [1:4] 5 2 0 3
 $ intensities: num [1:4] 1 0.4 0 0.6
 $ density    : num [1:4] 1 0.4 0 0.6
 $ mids       : num [1:4] 1.25 1.75 2.25 2.75
 $ xname      : chr "a"
 $ equidist   : logi TRUE
 - attr(*, "class")= chr "histogram"
R> 

R is object-oriented and most methods give you meaningful results back. Use them.

Dirk Eddelbuettel
I think using hist is a bad idea, because it calculates bin counts, not particular value counts.
mbq
Thanks Dirk - I understand that R is object-oriented, but I don't know how to figure out that plot=FALSE is an argument I can pass to hist, for example.
celenius
Try `help(hist)`.
Dirk Eddelbuettel
+2  A: 

If you want to use hist you don't need to specify the breaks as you did, just use the seq function

br <- seq(0.9, 9.9, 1)
num <- hist(a, br, plot=F)$counts

Also, if you're looking for a specific value you can also use which.

For instance:

num <- length(which(a == 1))
nico