tags:

views:

138

answers:

2

I have data that is mostly centered in a small range (1-10) but there is a significant number of points (say, 10%) which are in (10-1000). I would like to plot a histogram for this data that will focus on (1-10) but will also show the (10-1000) data. Something like a log-scale for th histogram.

Yes, i know this means not all bins are of equal size

A simple hist(x) gives alt text while hist(x,breaks=c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,3,4,5,7.5,10,15,20,50,100,200,500,1000,10000))) gives alt text

none of which is what I want.

update following the answers here I now produce something that is almost exactly what I want (I went with a continuous plot instead of bar-histogram):

breaks <- c(0,1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2,4,8)
ggplot(t,aes(x)) + geom_histogram(colour="darkblue", size=1, fill="blue") + scale_x_log10('true size/predicted size', breaks = breaks, labels = breaks)![alt text][3]

alt text the only problem is that I'd like to match between the scale and the actual bars plotted. There two options for doing that : the one is simply use the actual margins of the plotted bars (how?) then get "ugly" x-axis labels like 1.1754,1.2985 etc. The other, which I prefer, is to control the actual bins margins used so they will match the breaks.

+3  A: 

Log scale histograms are easier with ggplot than with base graphics. Try something like

library(ggplot2)
dfr <- data.frame(x = rlnorm(100, sdlog = 3))
ggplot(dfr, aes(x)) + geom_histogram() + scale_x_log10()

If you are desperate for base graphics, you need to plot a log-scale histogram without axes, then manually add the axes afterwards.

h <- hist(log10(dfr$x), axes = FALSE) 
Axis(side = 2)
Axis(at = h$breaks, labels = 10^h$breaks, side = 1)

For completeness, the lattice solution would be

library(lattice)
histogram(~x, dfr, scales = list(x = list(log = TRUE)))

AN EXPLANATION OF WHY LOG VALUES ARE NEEDED IN THE BASE CASE:

If you plot the data with no log-transformation, then most of the data are clumped into bars at the left.

hist(dfr$x)

The hist function ignores the log argument (because it interferes with the calculation of breaks), so this doesn't work.

hist(dfr$x, log = "y")

Neither does this.

par(xlog = TRUE)
hist(dfr$x)

That means that we need to log transform the data before we draw the plot.

    hist(log10(dfr$x))

Unfortunately, this messes up the axes, which brings us to workaround above.

Richie Cotton
As Joris mentions, in the base case setting `xaxt = "n"` is cleaner than `axes = FALSE`, since you don't need to manually create the y axis.
Richie Cotton
I don't understand the base graphics example - do you take the log of the values (`log10(dfr$x)`)? Why?
David B
also, please see update re. the nice ggplot2 solution of yours (+1)
David B
+2  A: 

Using ggplot2 seems like the most easy option. If you want more control over your axes and your breaks, you can do something like the following :

EDIT : new code provided

x <- c(rexp(1000,0.5)+0.5,rexp(100,0.5)*100)

breaks<- c(0,0.1,0.2,0.5,1,2,5,10,20,50,100,200,500,1000,10000)
major <- c(0.1,1,10,100,1000,10000)


H <- hist(log10(x),plot=F)


plot(H$mids,H$counts,type="n",
      xaxt="n",
      xlab="X",ylab="Counts",
      main="Histogram of X",
      bg="lightgrey"
)
abline(v=log10(breaks),col="lightgrey",lty=2)
abline(v=log10(major),col="lightgrey")
abline(h=pretty(H$counts),col="lightgrey")
plot(H,add=T,freq=T,col="blue")
#Position of ticks
at <- log10(breaks)

#Creation X axis
axis(1,at=at,labels=10^at)

This is as close as I can get to the ggplot2. Putting the background grey is not that straightforward, but doable if you define a rectangle with the size of your plot screen and put the background as grey.

Check all the functions I used, and also ?par. It will allow you to build your own graphs. Hope this helps.

alt text

Joris Meys
breaks defines where you put the ticks and the labels, major defines where you put the major vertical lines. With some extra code, you can add ticks and lines where you want. an extra command axis() with labels=NA does the trick I guess.
Joris Meys
+1 thank you Joris for all the help!
David B