tags:

views:

97

answers:

2

I have a function that I use to get a "quick look" at a data.frame... I deal with a lot of survey data and this acts as a quick tool to see what's what.

f.table <- function(x) {

    if (is.factor(x[[1]])) { 
        frequency <- function(x) {
            x <- round(length(x)/n, digits=2)
        }
        x <- na.omit(melt(x,c()))
        x <- cast(x, variable ~ value, frequency)
        x <- cbind(x,top2=x[,ncol(x)]+x[,ncol(x)-1], bottom=x[,2])
    }

    if (is.numeric(x[[1]])) {
        frequency <- function(x) { 
            x[x > 1] <- 1
            x[is.na(x)] <- 0
            x <- round(sum(x)/n, digits=2) 
            } 

        x <- na.omit(melt(x))
        x <- cast(x, variable ~ ., c(frequency, mean, sd, min, max))
        x <- transform(x, variable=reorder(variable, frequency))
    }

return(x) 
}

What I find happens is that if I don't define "frequency" outside of the function, it returns wonky results for data frames with continuous variables. It doesn't seem to matter which definition I use outside of the function, so long as I do.

try:

n <- 100    
x <- data.frame(a=c(1:25),b=rnorm(100),c=rnorm(100))
x[x > 20] <- NA 

Now, select either one of the frequency functions and paste them in and try it again:

frequency <- function(x) {
                x <- round(length(x)/n, digits=2)
            }
f.table(x)

Why is that?

A: 

I don't have the package that contains melt, but there are a couple potential issues I can see:

  1. Your frequency functions do not return anything.
  2. It's generally bad practice to alter function inputs (x is the input and the output).
  3. There is already a generic frequency function in stats package in base R, which may cause issues with method dispatch (I'm not sure).
Joshua Ulrich
1) Is not correct, Joshua, they do return something; the last expression evaluated. `foo <- function(x) x <- round(x); bar <- foo(runif(10)); bar`
Gavin Simpson
@ucfagis: I was not suggesting the functions did not return a value. I was saying they did not use the `return` function. I don't know why I thought (1) could be a problem (it was late in the day for me).
Joshua Ulrich
It's from Hadley's reshape package.
Brandon Bertelsen
A: 

Crucially, I think this is where your problem is. cast() is evaluating those functions without reference to the function it was called from. Inside cast() it evaluates fun.aggregate via funstofun and, although I don't really follow what it is doing, is getting stats:::frequency and not your local one.

Hence my comment to your Q. What do you wan the function to do? At the moment it would seem necessary to define a "frequency" function in the global environment so that cast() or funstofun() finds it. Give it a unique name so it is unlikely to clash with anything so it should be the only thing found, say .Frequency(). Without knowing what you want to do with the function (rather than what you thought the function [f.table] should do) it is a bit difficult to provide further guidance, but why not have .FrequencyNum() and .FrequencyFac() defined in the global workspace and rewrite your f.table() wrapper calls to cast to use the relevant one?

.FrequencyFac <- function(X, N) {
    round(length(X)/N, digits=2)
}

.FrequencyNum <- function(X, N) {
    X[X > 1] <- 1
    X[is.na(X)] <- 0
    round(sum(X)/N, digits=2)
}
f.table <- function(x, N) {
    if (is.factor(x[[1]])) {
        x <- na.omit(melt(x, c()))
        x <- dcast(x, variable ~ value, .FrequencyFac, N = N)
        x <- cbind(x,top2=x[,ncol(x)]+x[,ncol(x)-1], bottom=x[,2])
    }

    if (is.numeric(x[[1]])) {
        x <- na.omit(melt(x))
        x <- cast(x, variable ~ ., c(.FrequencyNum, mean, sd, min, max), N = N)
        ##x <- transform(x, variable=reorder(variable, frequency))
        ## left this out as I wanted to see what cast returned
    }
return(x) 
}

Which I thought would work, but it is not finding N, and it should be. So perhaps I am missing something here?

By the way, it is probably not a good idea to rely on function that find n (in your version) from outside the function. Always pass in the variables you need as arguments.

Gavin Simpson
This really seems like a bug in `funstofun`. If `fun.aggregate` is of length 1 you get the expected behavior. That's because `funstofun` takes only the `match.call()` as the argument rather than the evaluated value of the argument. Maybe the right fix is to change funstofun to take in evaluated named arguments and move the deparse to the call site.
Jonathan Chang
It's more like that cast should capture the `parent.frame()` in which it is called and pass that along to ensure the functions are evaluated in the correct environment. Programming with dynamic scope in R is very tricky to get right.
hadley
@hadley: With my version of `f.table`, am I doing something wrong in expecting `N` to be passed on to the `fun.aggregate` functions? Or is this caught up in why Brandon's original version isn't working as well; a scoping issue?
Gavin Simpson
I've found a number of other questions like this where scoping is problematic. I think I'm going to stick with what works and close it down for now. The why, seems to be "functions within functions are bad ideas in R"
Brandon Bertelsen
@Brandon; I don't think that functions within functions are the problem, *per se*. The issue is that you are embedding a fairly high-level function within another function. You need to fully comprehend how the high-level functions does it's job to know whether the function-within-function will work or not. Unfortunately, as Hadley notes above, it is difficult to write these high-level functions in R.
Gavin Simpson