tags:

views:

60

answers:

0

I've been working on code to create a parallel lapply() type function that uses Amazon's Elastic Map Reduce engine as the 'grid' for processing (yes, it's a mapper with no reducer). After I get the code stable I'll abstract it as a foreach backend. But first I need to build tests to test the code I have.

What would be some good test cases for this function?

My canonical test case right now is the following:

myList <- NULL
set.seed(1)
for (i in 1:10){
  a <- c(rnorm(999), NA)
  myList[[i]] <- a
}
outputLocal <- lapply(myList, mean, na.rm=T)
outputEmr   <- emrlapply(myList, mean, myCluster, na.rm=T)
all.equal(outputEmr, outputLocal) 

This test case makes sure the optional argument na.rm=T is passed properly to the remote machines. What are some other test cases that I could be using? I don't currently support simplify or USE.NAMES arguments, although I will in the future.