tags:

views:

173

answers:

2

I like the idea of making research available at multiple levels of detail i.e., abstract for the casually curious, full text for the more interested, and finally the data and code for those working in the same area/trying to reproduce your results. In between the actual text and the data/code level, I'd like to insert another layer. Namely, I'd like to create a kind of automatically generated appendix that contains the full regression output, diagnostic plots, exploratory graphs data profiles etc. from the analysis, regardless of whether those plots/regressions etc. made it into the final paper.

One idea I had was to write a script that would examine the .Rnw file and automatically:

  • Profile all data sets that are loaded (sort of like the Hmisc(?) package)
  • Summarize all regressions - i.e., run summary(model) for all models
  • Present all plots (regardless of whether they made it in the final version)

The idea is to make this kind of a low-effort, push-button sort of thing as opposed to a formal appendix written like the rest of a paper. What I'm looking for is some ideas on how to do this in R in a relatively simple way. My hunch is that there is some way of going through the namespace, figuring out what something is and then dumping into a PDF.

Thoughts? Does something like this already exist?

+1  A: 

John, this sounds interesting, but if you provide the data and the article is formatted in sweave, wouldn't this long log file be redundant?

back to your question, one package you might want to look into is zelig since it "automates the creation of replication data files so that you (or, if you wish, anyone else) can replicate the results of your analyses (hence satisfying the replication standard)" - http://goo.gl/FPHU - not what you are looking for, but the concept of replication data files might give you some other ideas. notice that multiple journals are now using replication data files.

Ricardo Pietrobon
Thanks for the pointer to Zelig - I hadn't see that before.I'll give you an example of what I'm trying to do: Suppose you write in the text something like "an auxiliary regression shows that uptake was not correlated with gender" - suppose this fact is all that matters and the coefficient is not important - in fact, this might just be a parenthetical remark or a footnote. But I did in fact run m <-lm(I(y > 0) ~ gender, data = data) and I'd like to put this in the web-appendix/log type file for the really interested parties. See what I mean?
John Horton
+1  A: 

We made an attempt at this with our recent JASA article: http://hdl.handle.net/1902.1/12174. You should be able to "make" the whole paper. One thing to notice about our reproduction archive: we packaged versions of the R packages that we used. It turned out that as people improve their packages, sometimes they change defaults --- which would break our build. Perhaps in the future one might distribute an entire virtual machine including the R binary which would be called [recall how round(x,digits=) lost its arguments and became positional from version of R to the next -- making round(digits=,x) provide nonsense results without warning?].

Anyway, this is our first attempt at such a complex document. I have a smaller version here http://hdl.handle.net/1902.1/13376 which does not use make.

Jake