views:

211

answers:

3

Hi R-ers,

Does anyone have any good thoughts on how to code complex tabulations in R?

I am afraid I might be a little vague on this, but I want to set up a script to create a bunch of tables of a complexity analogous to the stat abstract of the united states, (e.g.: http://www.census.gov/compendia/statab/tables/09s0015.pdf). And I would like to avoid a whole bunch of rbind and hbind statements.

In SAS, I have heard, there is a table creation specification language; I was wondering if there was something of similar power for R?

Thanks!

+3  A: 

It looks like you want to apply a number of different calculations to some data, grouping it by one field (in the example, by state)?

There are many ways to do this. See this related question.

You could use Hadley Wickham's reshape package (see reshape homepage). For instance, if you wanted the mean, sum, and count functions applied to some data grouped by a value (this is meaningless, but it uses the airquality data from reshape):

> library(reshape)
> names(airquality) <- tolower(names(airquality))
> # melt the data to just include month and temp
> aqm <- melt(airquality, id="month", measure="temp", na.rm=TRUE)
> # cast by month with the various relevant functions
> cast(aqm, month ~ ., function(x) c(mean(x),sum(x),length(x)))
  month X1   X2 X3
1     5 66 2032 31
2     6 79 2373 30
3     7 84 2601 31
4     8 84 2603 31
5     9 77 2307 30

Or you can use the by() function. Where the index will represent the states. In your case, rather than apply one function (e.g. mean), you can apply your own function that will do multiple tasks (depending upon your needs): for instance, function(x) { c(mean(x), length(x)) }. Then run do.call("rbind" (for instance) on the output.

Also, you might give some consideration to using a reporting package such as Sweave (with xtable) or Jeffrey Horner's brew package. There is a great post on the learnr blog about creating repetitive reports that shows how to use it.

Shane
Just a quick remark - `each` takes care of the column names as well: `cast(aqm, month ~ ., each(mean, sum, length)`. And, the simplest is to use `c`: `cast(aqm, month ~ ., c(mean, sum, length)`
learnr
+1  A: 

Another options is the plyr package.

library(plyr)
names(airquality) <- tolower(names(airquality))
ddply(airquality, "month", function(x){
    with(x, c(meantemp = mean(temp), maxtemp = max(temp), nonsense = max(temp) - min(solar.r)))
})
Thierry
A: 

Hey there,

Here is an interesting blog posting on this topic. The author tries to create a report analogous to the United Nation's World Population Prospects: The 2008 Revision report.

Hope that helps, Charlie

Charlie
Charlie: Isn't that the same link at the bottom of my answer?
Shane
Hi Shane, You're right, I'm sorry; I didn't notice your link.
Charlie