ansaurus

Question

Answer 1

+3 A:

I tend to use png files rather than vector based graphics such as pdf or eps for this situation. The files are much smaller, although you lose resolution.

If it's a more conventional scatterplot, then using semi-transparent colours also helps, as well as solving the over-plotting problem. For example,

x <- rnorm(10000); y <- rnorm(10000)
qplot(x, y, colour=I(alpha("blue",1/25)))

Rob Hyndman 2009-12-26 10:19:06

Answer 2

+3 A:

Beyond Rob's suggestions, one plot function I like as it does the 'thinning' for you is hexbin; an example is at the R Graph Gallery.

Dirk Eddelbuettel 2009-12-26 14:14:17

Or, with ggplot2, `geom = "hex"`

hadley 2009-12-26 17:24:43

Answer 3

+1 A:

Here is one possible solution for downsampling plot with respect to the x axis, if it is log transformed. It log transforms the x-axis, rounds that quantity, and picks the median x value in that bin:

downsampled_qplot <- function(x,y,data,rounding=0, ...) {
  # assumes we are doing log=xy or log=x
  group = factor(round(log(data$x),rounding))
  d <- do.call(rbind, by(data, group, 
    function(X) X[order(X$x)[floor(length(X)/2)],]))
  qplot(x,count,data=d, ...)
}

Using the definition of ccdf() from above, we can then compare the original plot of the CCDF of the distribution with the downsampled version:

myccdf=ccdf(rlnorm(10000,3,2.4))

qplot(x,count,data=myccdf,log='xy',main='original')

alt text

downsampled_qplot(x,count,data=myccdf,log='xy',rounding=1,main='rounding = 1')

alt text

downsampled_qplot(x,count,data=myccdf,log='xy',rounding=0,main='rounding = 0')

alt text

In PDF format, the original plot takes up 640K, and the downsampled versions occupy 20K and 8K, respectively.

eytan 2009-12-26 18:53:14

rather than rounding, one could also more generally do something like: group = cut(log(data$x), b=maxpoints)

eytan 2009-12-26 19:14:17

Answer 4

+2 A:

I'd either make image files (png or jpeg devices) as Rob already mentioned, or I'd make a 2D histogram. An alternative to the 2D histogram is a smoothed scatterplot, it makes a similar graphic but has a more smooth cutoff from dense to sparse regions of space.

If you've never seen addictedtor before, it's worth a look. It has some very nice graphics generated in R with images and sample code.

Here's the sample code from the addictedtor site:

2-d histogram:

require(gplots) 

# example data, bivariate normal, no correlation
x <- rnorm(2000, sd=4) 
y <- rnorm(2000, sd=1) 

# separate scales for each axis, this looks circular
hist2d(x,y, nbins=50, col = c("white",heat.colors(16))) 
rug(x,side=1) 
rug(y,side=2) 
box()

smoothscatter:

library("geneplotter")  ## from BioConductor
require("RColorBrewer") ## from CRAN

x1  <- matrix(rnorm(1e4), ncol=2)
x2  <- matrix(rnorm(1e4, mean=3, sd=1.5), ncol=2)
x   <- rbind(x1,x2)

layout(matrix(1:4, ncol=2, byrow=TRUE))
op <- par(mar=rep(2,4))
smoothScatter(x, nrpoints=0)
smoothScatter(x)
smoothScatter(x, nrpoints=Inf,
              colramp=colorRampPalette(brewer.pal(9,"YlOrRd")),
              bandwidth=40)
colors  <- densCols(x)
plot(x, col=colors, pch=20)

par(op)

James Thompson 2009-12-27 06:23:24

ansaurus

tags:

views:

answers:

maximum plot points in R?

related questions