tags:

views:

205

answers:

2

I have a dataframe with a column of integers that I would like to use as a reference to make a new categorical variable. I want to divide the variable into three groups and set the ranges myself (ie 0-5, 6-10, etc). I tried cut but that divides the variable into groups based on a normal distribution and my data is right skewed. I have also tried to use if/then statements but this outputs a true/false value and I would like to keep my original variable. I am sure that there is a simple way to do this but I cannot seem to figure it out. Any advice on a simple way to do this quickly?

I had something in mind like this:

x   x.range
3   0-5
4   0-5
6   6-10
12  11-15
+5  A: 
x <- rnorm(100,10,10)
cut(x,c(-Inf,0,5,6,10,Inf))
Ian Fellows
+1  A: 

Ian's answer ('cut') is the most common way to do this, as far as i know.

I prefer to use 'shingle', from the Lattice Package because the argument that specifies the binning intervals seems a little more intuitive to me. E.g.:

data = sample(0:40, 200, replace=T)
a = c(0, 5);b = c(5,9);c = c(9, 19);d = c(19, 33);e = c(33, 41)

my_bins = matrix(rbind(a, b, c, d, e), ncol=2)
# returns: (the binning intervals i've set)
        [,1] [,2]
 [1,]    0    5
 [2,]    5    9
 [3,]    9   19
 [4,]   19   33
 [5,]   33   41

shx = shingle(data, intervals=my_bins)

#'shx' at the interactive prompt will give you a nice frequency table:
# Intervals:
   min max count
1   0   5    23
2   5   9    17
3   9  19    56
4  19  33    76
5  33  41    46
doug
Good idea with 'shingle', nicer than the regular table function for sure.
Stedy