tags:

views:

378

answers:

2

Background:

I'm running a Monte Carlo simulation to show that a particular process (a cumulative mean) does not converge over time, and often diverges wildly in simulation (the expectation of the random variable = infinity). I want to plot about 10 of these simulations on a line chart, where the x axis has the iteration number, and the y axis has the cumulative mean up to that point.

Here's my problem:

I'll run the first simulation (each sim. having 10,000 iterations), and build the main plot based on its current range. But often one of the simulations will have a range a few orders of magnitude large than the first one, so the plot flies outside of the original range. So, is there any way to dynamically update the ylim or xlim of a plot upon adding a new set of points or lines?

I can think of two workarounds for this: 1. store each simulation, then pick the one with the largest range, and build the base graph off of that (not elegant, and I'd have to store a lot of data in memory, but would probably be laptop-friendly [[EDIT: as Marek points out, this is not a memory-intense example, but if you know of a nice solution that'd support far more iterations such that it becomes an issue (think high dimensional walks that require much, much larger MC samples for convergence) then jump right in]]) 2. find a seed that appears to build a nice looking version of it, and set the ylim manually, which would make the demonstration reproducible.

Naturally I'm holding out for something more elegant than my workarounds. Hoping this isn't too pedestrian a problem, since I imagine it's not uncommon with simulations in R. Any ideas?

A: 
Vince
A solution that uses base graphics and is memory lighter would be to track the X and Y max. Then save the full data set to a file. When you complete a new run, if the range is larger redo the plot and then loop through the stored data files.
Peter
Instead of check (and eventually replot) after each trial, one could save data and store ranges, then find global range, use it as ylim and plot results (for the first time).
Marek
Part of my backing of this approach was Marek's comment above, that this isn't *that* much data. 7MB of RAM. Compared to the genome assembly memory requirements I've seen recently, MCMC sims are a drop in the bucket!
Vince
Yep - this example doesn't require me to worry about memory, even on a crappy laptop. Let's see what y'all say when I bring in a far bigger fish :)
HamiltonUlmer
+4  A: 

I'm not sure if this is possible using base graphics, if someone has a solution I'd love to see it. However graphics systems based on grid (lattice and ggplot2) allow the graphics object to be saved and updated. It's insanely easy in ggplot2.

require(ggplot2)

make some data and get the range:

foo <- as.data.frame(cbind(data=rnorm(100), numb=seq_len(100)))

make an initial ggplot object and plot it:

p <- ggplot(as.data.frame(foo), aes(numb, data)) + layer(geom='line')
p

make some more data and add it to the plot

foo <- as.data.frame(cbind(data=rnorm(200), numb=seq_len(200)))

p <- p + geom_line(aes(numb, data, colour="red"), data=as.data.frame(foo))

plot the new object

p
Peter
This is a good solution, and proof I need to use ggplot2 more. Using rnorm(200, 0, 1000) in the second foo assignment really shows that this works beautifully :-)
Vince
Now that I think about this a bit more, this won't help w/ a memory issue if there is one (all that ggplot object data has to live somewhere.)
Peter
Given this particular context (simple, short simulations), using ggplot is the nicest approach. And the fact is, ggplot2 is really nice for other reasons, so an answer that uses it is all right in my book. I'll potentially ask this again if . when memory IS an issue.
HamiltonUlmer