views:

410

answers:

5

I asked this question yesterday about storing a plot within an object. I tried implementing the first approach (aware that I did not specify that I was using qplot() in my original question) and noticed that it did not work as expected.

library(ggplot2)               # add ggplot2

string = "C:/example.pdf"      # Setup pdf
pdf(string,height=6,width=9)

x_range <- range(1,50)         # Specify Range

# Create a list to hold the plot objects.
pltList <- list()
pltList[]

for(i in 1 : 16){

# Organise data 
y = (1:50) * i * 1000                       # Get y col
x = (1:50)                                  # get x col
y = log(y)                                  # Use natural log

# Regression
lm.0 = lm(formula = y ~ x)                  # make linear model
inter = summary(lm.0)$coefficients[1,1]     # Get intercept
slop = summary(lm.0)$coefficients[2,1]      # Get slope

# Make plot name
pltName <- paste( 'a', i, sep = '' )

# make plot object    
p <- qplot(
    x, y,   
    xlab = "Radius [km]", 
    ylab = "Services [log]",
    xlim = x_range,
    main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)        

print(p)     

pltList[[pltName]] = p       
}

# close the PDF file
dev.off()

I have used sample numbers in this case so the code runs if it is just copied. I did spend a few hours puzzling over this but I cannot figure out what is going wrong. It writes the first set of pdfs without problem, so I have 16 pdfs with the correct plots.

Then when I use this piece of code:

string = "C:/test_tabloid.pdf"
pdf(string, height = 11, width = 17)

grid.newpage()
pushViewport( viewport( layout = grid.layout(3, 3) ) )

vplayout <- function(x, y){viewport(layout.pos.row = x, layout.pos.col = y)}

counter = 1

# Page 1
for (i in 1:3){    
    for (j in 1:3){     
         pltName <- paste( 'a', counter, sep = '' )   
         print( pltList[[pltName]], vp = vplayout(i,j) )
         counter = counter + 1
     }
 }

 dev.off()

the result I get is the last linear model line (abline) on every graph, but the data does not change. When I check my list of plots, it seems that all of them become overwritten by the most recent plot (with the exception of the abline object).

A less important secondary question was how to generate a muli-page pdf with several plots on each page, but the main goal of my code was to store the plots in a list that I could access at a later date.

Thank you.

+1  A: 

For your second question: Multi-page pdfs are easy -- see help(pdf):

 onefile: logical: if true (the default) allow multiple figures in one
          file.  If false, generate a file with name containing the
          page number for each page.  Defaults to ‘TRUE’.

For your main question, I don't understand if you want to store the plot inputs in a list for later processing, or the plot outputs. If it is the latter, I am not sure that plot() returns an object you can store and retrieve.

Dirk Eddelbuettel
I was hoping to store the plot outputs. If I store the plot inputs, does that include the values of x and y at that particular time?
womble
Of course. Just store all function arguments etc in a list. That is very standard. But your assumption of storing _plot output_ is not. The plot results are device-dependent and most likely OS-dependent. Just write to a file, possibly a bitmap, and display that. Or write GUI style apps. Or just open multiple plot windows.
Dirk Eddelbuettel
However, `ggplot` may well return objects to you. In which case Eduardo's answer is your key.
Dirk Eddelbuettel
ggplot graphs are indeed constructed by creating a specification which is that converted into a graph by using the print.ggplot() method. So you can (handily!) create a ggplot object without printing it, then use ggsave() to save it to a PDF directly. Or use save() to just save the data structure.
Harlan
+2  A: 

There is a bug in your code concerning list subscripting. It should be

pltList[[pltName]]

not

pltList[pltName]

Note:

class(pltList[1])
[1] "list"

pltList[1] is a list containing the first element of pltList.

class(pltList[[1]])
[1] "ggplot"

pltList[[1]] is the first element of pltList.

Eduardo Leoni
Sorry - I made a mistake with what I meant to paste. I did not fully understand the difference between the syntaxes and had been editing it to see the difference. However my error still exists as I describe above.
womble
+1  A: 

Another suggestion regarding your second question would be to use either Sweave or Brew as they will give you complete control over how you display your multi-page pdf.

Have a look at this related question.

Shane
+4  A: 

I think you should use the data argument in qplot, i.e., store your vectors in a data frame.

See Hadley's book, Section 4.4:

The restriction on the data is simple: it must be a data frame. This is restrictive, and unlike other graphics packages in R. Lattice functions can take an optional data frame or use vectors directly from the global environment. ...

The data is stored in the plot object as a copy, not a reference. This has two important consequences: if your data changes, the plot will not; and ggplot2 objects are entirely self-contained so that they can be save()d to disk and later load()ed and plotted without needing anything else from that session.

rcs
+5  A: 

Ok, so if your plot command is changed to

p <- qplot(data = data.frame(x = x, y = y),
           x, y,   
           xlab = "Radius [km]", 
           ylab = "Services [log]",
           xlim = x_range,
           ylim = c(0,10),
           main = paste("Sample",i)
           ) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)

then everything works as expected. Here's what I suspect is happening (although Hadley could probably clarify things). When ggplot2 "saves" the data, what it actually does is save a data frame, and the names of the parameters. So for the command as I have given it, you get

> summary(pltList[["a1"]])
data: x, y [50x2]
mapping:  x = x, y = y
scales:   x, y 
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:  
stat_identity:  
position_identity: (width = NULL, height = NULL)

mapping: group = 1 
geom_abline: colour = red, size = 1 
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091 
position_identity: (width = NULL, height = NULL)

However, if you don't specify a data parameter in qplot, all the variables get evaluated in the current scope, because there is no attached (read: saved) data frame.

data: [0x0]
mapping:  x = x, y = y
scales:   x, y 
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:  
stat_identity:  
position_identity: (width = NULL, height = NULL)

mapping: group = 1 
geom_abline: colour = red, size = 1 
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091 
position_identity: (width = NULL, height = NULL)

So when the plot is generated the second time around, rather than using the original values, it uses the current values of x and y.

Jonathan Chang
Thanks RCS and Jonathan, this fixed the problem. I was unaware of the data argument and how it could be used to store the data. I'm examining that section of the book now.
womble