tags:

views:

107

answers:

3

I have two functions that start pretty similarly. Hence I wonder if this is the right moment to dive into inheritance in R.

firstfunc <- function(table,pattern="^Variable") {

dframe <- get(table)
cn <- colnames(get(table))
qs <- subset(cn, cn  %in% grep(pattern, cn, value=TRUE))

    .....

}

secondfunc <- function(table,pattern="^stat"){

dframe <- get(table)
cn <- colnames(get(table))
qs <- subset(cn, cn  %in% grep(pattern, cn, value=TRUE))

    ....

}

There will be more than two functions and two patterns. My tables contains a lot of variables, which can be easily grouped by their names, which is why I use these pattern identification. It works well so far and c&p these few lines is not that much of an effort. However, is it reasonable to write these lines into one function / method and let the others inherit?

Most help I read on OO in R so far used examples that assigned attributes to data and then used generic functions. Unfortunately I did not understand yet if this can help my case too.

Thx for any suggestions, pointers to a good head first start into this!

+4  A: 

Can you? Yes. S4 has features to handle this scenario. See the wiki page for some resources. Hadley also recently wrote a nice introduction (see the section on "Generic functions and methods").

You can see this with setMethod in any existing S4 code (see timeSeries for an example). Note the different signatures for the same function.

Should you? Yes you should, but you will be adding some complexity to the code. S4 doesn't come for free; it requires a lot more infrastructure. So there's a trade off, and you will need to decide whether it's worth.

Shane
Thx Shane! I will definitely check it. I found really comprehensive site: http://zoonek2.free.fr/UNIX/48_R/02.html but somehow warns from S4 (just use ctrl+f to find s4). What do you think about that?
ran2
@ran2 You "should" use it for this case, *in theory*. I have never used it with a package myself because of the additional overhead. I would personally just create 1 function with several subfunctions in the example that you provided. It's easier to follow.
Shane
in his case S4 will not work, methods in R dispatch according to data types passed to the function. He has two functions with the same data types.
VitoshKa
hmm. what to believe now :) ?
ran2
Dispatch "is based on the class of all arguments", so you can just make the class different for your pattern argument and you will have what you want. Changing the class of something is easy: `class(object) <- "new.name"`.
Shane
That being said, I agree with @VitoshKa; I would never do this myself.
Shane
+1  A: 

[Edit: Ah, I didnt notice you only posted the start of your functions, and the bodies are probably different]

The other thing you might want to look into, given all this 'get' stuff and use of column names, is the formula mechanism as used by lm() and friends. You can specify columns by name in a formula, something like:

foofunc(~Variable, data=mytable)

and use the model functions to get the values. Things like model.matrix and so on. I'm guessing from the 'gets' that you are passing names of objects around, which is a bad thing to do generally. Pass the object.

Spacedman
The reason why I pass object names is that dbListTables returns table names as character vectors. Plus I just wanted to have a closer look at OO in R and thought this could be an opportunity to do so. Nevertheless, thx for the pointer!
ran2
+6  A: 

There is no inheritance of function parts in R. You cannot "inherit part's" of functions from other functions, only call functions from other functions. All OO paradigms in R (S3,S4,refClasses) are exactly what they say, object-oriented. Methods are dispatched according to the class of objects they receive.

Your question is really how to get rid of code repetition.

There are two ways, one standard and one not so standard.

  • Standard way: Write functions for repeated code and call them from other functions. The drawback is that functions return only one object, but you have three. So you can do something like this:

    repeated_code <- function(table, pattern){
        objects <- list()
        objects$dframe <- get(table)           
        objects$cn <- colnames(get(table))
        objects$qs <- subset(cn, cn  %in% grep(pattern, cn, value=TRUE))
        }
    
    
    firstfunc <- function(table,pattern="^Variable") {
          objects <- repeated_code(table, pattern)
          ...
          manipulate objects
          ...
          }
    
    
    secondfunc <- function(table,pattern="^Variable") {
          objects <- repeated_code(table, pattern)
          ...
          manipulate objects
          ...
          }     
    
  • Not so standard way: Use unevaluated expressions:

     redundant_code <- expression({
          dframe <- get(table)  
          cn <- colnames(get(table))
          qs <- subset(cn, cn  %in% grep(pattern, cn, value=TRUE))
     })
    
    
     firstfunc <- function(table,pattern="^Variable") {
         eval(redundant_code, envir=parent.frame())
         ...
     }
    
    
     secondfunc <- function(table,pattern="^Variable") {
         eval(redundant_code, envir=parent.frame())
         ...
     }
    

[Update: Since the R 2.12.0 there is yet another, multi-assign way. Write a function wich returns the list of objects (like in the "standard" case above). Then assign the objects in the returned list to the current evnvironmnet with list2env:

    secondfunc <- function(table,pattern="^Variable") {
          objects <- repeated_code(table, pattern)
          list2env(objects, envir = parent.frame())
          ...
          }     

]

VitoshKa
Cheers for the heads-up about `list2env`. Very interesting.
Richie Cotton
thx for the nice list.. I´ll try to figure out which one fits best for me. More generally speaking, is there one solution you´d prefer or one that you´d rather not suggest?
ran2
Writing function (first and the third approaches) is completely inline with functional programming. So you have a full support from R here. Particularly that means argument matching, argument completion, custom help files, local environments, lexical scoping - everything you are used to with functions in R. Expressions instead are just blank code (no arguments) and are difficult to manage because there are no support from R for this type of programming. Instead expressions are faster (no argument matching, no local environment, no function call); I would use them for simulations.
VitoshKa
If it's about me, first, I would try to streamline the code such that no multi-assign is necessary (i.e. call one func to return one object, call another func on that object etc). Second, if that is not easely posible I would do multi-assign. And finally for speed and for manipulations of objects inside environments I would use expressions.
VitoshKa