ansaurus

Question

Worse sin: side effects or passing massive objects?

Answer 1

A:

It's tough to say definitively without knowing the language/compiler used. However, if you can simply pass a pointer/reference to the object that you're creating, then the size of the object itself has nothing to do with the speed of the function calls. Manipulating this data down the road could be a different story.

Jeffrey 2008-09-17 03:43:22

The language he's using is R: http://r-project.org/

Allen 2008-09-17 03:48:55

Answer 2

+4 A:

use variables in the outer function instead of global variables. This gets you the best of both approaches: you're not mutating global state, and you're not copying a big wad of data. If you have to exit early, just return the partial results.

(See the "Scope" section in the R manual: http://cran.r-project.org/doc/manuals/R-intro.html#Scope)

Allen 2008-09-17 03:48:29

Answer 3

A:

Third approach: inner function returns a reference to the large array, which the next statement inside the loop then dereferences and stores wherever it's needed (ideally with a single pointer store and not by having to memcopy the entire array).

This gets rid of both the side effect and the passing of large datastructures.

pjz 2008-09-17 03:49:36

Answer 4

+3 A:

It's not going to make much difference to memory use, so you might as well make the code clean.

Since R has copy-on-modify for variables, modifying the global object will have the same memory implications as passing something up in return values.

If you store the outputs in a database (or even in a file) you won't have the memory use issues, and the data will be incrementally available as it is created, rather than just at the end. Whether it's faster with the database depends primarily on how much memory you are using: is the reduction is garbage collection going to pay for the cost of writing to disk.

There are both time and memory profilers in R, so you can see empirically what the impacts are.

2008-09-17 03:51:30

Answer 5

A:

I'm not sure I understand the question, but I have a couple of solutions.

Inside the function, create a list of the vectors and return that.
Inside the function, create an environment and store all the vectors inside of that. Just make sure that you return the environment in case of errors.

in R:

help(environment)

# You might do something like this:



outer <- function(datasets) {
  # create the return environment
  ret.env <- new.env()
  for(set in dataset) {
    tmp <- inner(set)
    # check for errors however you like here.  You might have inner return a list, and
    # have the list contain an error component
    assign(set, tmp, envir=ret.env)
  }
  return(ret.env)
}

#The inner function might be defined like this

inner <- function(dataset) {
  # I don't know what you are doing here, but lets pretend you are reading a data file
  # that is named by dataset
  filedata <- read.table(dataset, header=T)
  return(filedata)
}

leif

leif 2008-09-17 03:58:26

Answer 6

+1 A:

Remember your Knuth. "Premature optimization is the root of all programming evil."

Try the side effect free version. See if it meets your performance goals. If it does, great, you don't have a problem in the first place; if it doesn't, then use the side effects, and make a note for the next programmer that your hand was forced.

Rob Hansen 2008-09-17 04:11:08

Answer 7

A:

Thank you all for your informative and helpful answers!

Thanks also for the (unintentionally?) humorous line "I'm not sure I understand the question, but I have a couple of solutions"! Put a smile on my face.

2008-09-17 15:08:15

Answer 8

A:

FYI, here's a full sample toy solution that avoids side effects:

outerfunc <- function(names) {
  templist <- list()
  for (aname in names) {
    templist[[aname]] <- innerfunc(aname)
  }
  templist
}

innerfunc <- function(aname) {
  retval <- NULL
  if ("one" %in% aname) retval <- c(1)
  if ("two" %in% aname) retval <- c(1,2)
  if ("three" %in% aname) retval <- c(1,2,3)
  retval
}

names <- c("one","two","three")

name_vals <- outerfunc(names)

for (name in names) assign(name, name_vals[[name]])

2008-09-17 19:39:37

ansaurus

tags:

views:

answers:

Worse sin: side effects or passing massive objects?

related questions