I've been working on code to create a parallel lapply() type function that uses Amazon's Elastic Map Reduce engine as the 'grid' for processing (yes, it's a mapper with no reducer). After I get the code stable I'll abstract it as a foreach backend. But first I need to build tests to test the code I have.
What would be some good test cases for this function?
My canonical test case right now is the following:
myList <- NULL
set.seed(1)
for (i in 1:10){
a <- c(rnorm(999), NA)
myList[[i]] <- a
}
outputLocal <- lapply(myList, mean, na.rm=T)
outputEmr <- emrlapply(myList, mean, myCluster, na.rm=T)
all.equal(outputEmr, outputLocal)
This test case makes sure the optional argument na.rm=T
is passed properly to the remote machines. What are some other test cases that I could be using? I don't currently support simplify
or USE.NAMES
arguments, although I will in the future.