views:

102

answers:

1

...besides the fact that Rscript is invoked with #!/usr/bin/env Rscript and littler with #!/usr/local/bin/r (on my system) in first line of script file. I've found certain differences in execution speed (seems like littler is a bit slower).

I've created two dummy scripts, ran each 1000 times and compared average execution time.

Here's the Rscript file:

#!/usr/bin/env Rscript

btime <- proc.time()
x <- rnorm(100)
print(x)
print(plot(x))
etime <- proc.time()
tm <- etime - btime
sink(file = "rscript.r.out", append = TRUE)
cat(paste(tm[1:3], collapse = ";"), "\n")
sink()
print(tm)

and here's the littler file:

#!/usr/local/bin/r

btime <- proc.time()
x <- rnorm(100)
print(x)
print(plot(x))
etime <- proc.time()
tm <- etime - btime
sink(file = "little.r.out", append = TRUE)
cat(paste(tm[1:3], collapse = ";"), "\n")
sink()
print(tm)

As you can see, they are almost identical (first line and sink file argument differ). Output is sinked to text file, hence imported in R with read.table. I've created bash script to execute each script 1000 times, then calculated averages.

Here's bash script:

for i in `seq 1000`
do
./$1
echo "####################"
echo "Iteration #$i"
echo "####################"
done

And the results are:

# littler script
> mean(lit)
    user   system  elapsed 
0.489327 0.035458 0.588647 
> sapply(lit, median)
   L1    L2    L3 
0.490 0.036 0.609 
# Rscript
> mean(rsc)
    user   system  elapsed 
0.219334 0.008042 0.274017 
> sapply(rsc, median)
   R1    R2    R3 
0.220 0.007 0.258 

Long story short: beside (obvious) execution-time difference, is there some other difference? More important question is: why should/shouldn't you prefer littler over Rscript (or vice versa)?

+6  A: 

Couple quick comments:

  1. The path /usr/local/bin/r is arbitrary, you can use /usr/bin/env r as well as we do in some examples. As I recall, it limits what other arguments you can give to r as it takes only one when invoked via env

  2. I don't understand your benchmark, and why you'd do it that way. We do have timing comparisons in the sources, see tests/timing.sh and tests/timing2.sh. Maybe you want to split the test between startup and graph creation or whatever you are after.

  3. Whenever we ran those tests, littler won. (It still won when I re-ran those right now.) Which made sense to us because if you look at the sources to Rscript.exe, it works different by setting up the environment and a command string before eventually calling execv(cmd, av). littler can start a little quicker.

  4. The main price is portability. The way littler is built, it won't make it to Windows. Or at least not easily. OTOH we has RInside ported so if someone really wanted to...

  5. Littler came first in September 2006 versus Rscript which came with R 2.5.0 in April 2007.

  6. Rscript is now everywhere where R is. That is a big advantage.

  7. Command-line options are a little more sensible for littler in my view.

  8. Both work with CRAN packages getopt and optparse for option parsing.

So it's a personal preference. I co-wrote littler, learned a lot doing that (eg for RInside) and still find it useful -- so I use it dozens of times each day. It drives CRANberries. It drives cran2deb. Your mileage may, as hey say, vary.

Disclaimer: littler is one of my projects.

Postscriptum: I would have written the test as

I would have written this as

  fun <- function { X <- rnorm(100); print(x); print(plot(x)) }
  replicate(N, system.time( fun )["elapsed"])

or even

  mean( replicate(N, system.time(fun)["elapsed"]), trim=0.05)

to get rid of the outliers. Moreover, you only essentially measure I/O (a print, and a plot) which both will get from the R library so I would expect little difference.

Dirk Eddelbuettel
Dirk, thanks for prompt and thorough answer! I expected your response with a great deal of anxiety, because of your involvement in this project (and, yes, I knew that before starting a post). Ad 1: I use ArchLinux, and I get only `/usr/local/bin/r` with `whereis r`. If I put `/usb/bin/env r` error occurs. Ad 2: I'll give tests a try. I know that littler should perform faster, and I'm still amazed by the fact that littler performed slower with graph creation. Ad 3: I don't understand, you ran scripts form my post, and got different results? Can you, please, post them?
aL3xa
You could have emailed me :) Re 1: No error here, make sure you have correct modes and everything. Re 2: My tests were about how fast each different variant starts; once started I would expect them to do the same as they all use the same underlying R. Re 3: No I did not use your script; I was just trying to show that one should use `system.time(expression)` rather than the `proc.time()` construct.
Dirk Eddelbuettel
OK, I'll change dummy scripts (named after their author) and see what happens.
aL3xa