tags:

views:

151

answers:

3

Is there an easy way to get ride of the traditional quartiles returned by summary.formula with method="reverse" from the Hmisc R library? I would like to get the Mean/SD + Min/Max for each of my continuous variable but didn't succeed. It is possible to pass a custom function call through the argument fun, but it doesn't work when method="reverse".

+2  A: 

Does it have to be within the Hmisc package? If you have a dataframe of continuous variables you could get the same result with a simple use of the reshape package:

df <- data.frame(a=rnorm(100),b=rnorm(100),c=rnorm(100))

f.summary <- function(x) {
x <- melt(x)
x <- cast(x, variable ~ ., c(mean, sd, min, max))
return(x)
} 

f.summary(df)

HTH

Brandon Bertelsen
Thanks! Actually I managed to write something similar, but without `reshape`; your solution looks by far better than mine :)
chl
+1  A: 

Arf... I just look at the code of summary.formula() in the Hmisc package and I can confirm that Mean and SD are indeed computed but not shown when printing on the command line. So, we have to ask for it explicitely when calling the print() function, e.g.

library(Hmisc)
df <- data.frame(g=sample(LETTERS[1:3], 100, rep=TRUE), replicate(3, rnorm(100)))
s <- summary(g ~ ., method="reverse", data=df)
latex(s, prmsd=TRUE, digits=2)  # replace latex by print to output inline

which yields the following Table:

alt text

chl
+1  A: 

The answer is no. The package author has decided (as he states in the post Gnark linked to) that the minimum, maximum, and standard error are (paraphrasing) "certainly not descriptive" of continuous variables by categorical group.

You can set prmsd=TRUE in print.summary.formula.reverse to get the mean and standard deviation, but there's no way to get the min or max.

> Data <- data.frame(y=sample(1:2,20,TRUE),x=rnorm(20))
> print(summary.formula(y ~ x,data=Data,method="reverse"),prmsd=TRUE)


Descriptive Statistics by y

+-+---------------------------------------------------------+---------------------------------------------------------+
| |1                                                        |2                                                        |
| |(N=11)                                                   |(N=9)                                                    |
+-+---------------------------------------------------------+---------------------------------------------------------+
|x|-0.5382053/-0.3375862/ 0.3093839  -0.1434995+/- 1.1113628|-0.4464168/-0.1677906/ 0.3007129   0.1234988+/- 0.9666382|
+-+---------------------------------------------------------+---------------------------------------------------------+
Joshua Ulrich
@Joshua It seems we wrote our response quite at the same time... In fact, Harrell uses an internal function called `sfn` which call the `quantile()` function, so I think we can replace this by a call to `range()` to get our results, and overwrite the internal function, no?
chl
@chl: you can try, but I doubt it's that easy. The function (or others you use) may expect those three quantiles later.
Joshua Ulrich