tags:

views:

1609

answers:

3

I'm currently trying to generate a histogram in R on a logarithmic scale, but I haven't the clue where to start. I've looked on Google but none of the stuff I've seen really does what I want.

To plot the histogram I'm using:

hist(mydata$V3, breaks=c(0,1,2,3,4,5,25))

This gives me a histogram, but the density between 0 to 1 is so great (about a million values difference) that you can barely make out any of the other bars.

Then I've tried doing:

mydata_hist <- hist(mydata$V3, breaks=c(0,1,2,3,4,5,25), plot=FALSE)
plot(rpd_hist$counts, log="xy", pch=20, col="blue")

It gives me sorta what I want, but the bottom shows me the values 1-6 rather than 0, 1, 2, 3, 4, 5, 25. Its also showing the data as a point rather than a bar. barplot works but then I don't get any bottom axis.

TIA,

Weegee

+5  A: 

A histogram is a poor-man's density estimate. Note that in your call to hist() using default arguments, you get frequencies not probabilities -- add ,prob=TRUE to the call if you want probabilities.

As for the log axis problem, don't use 'x' if you do not want the x-axis transformed:

plot(mydata_hist$count, log="y", type='h', lwd=10, lend=2)

gets you bars on a log-y scale -- the look-and-feel is still a little different but can probably be tweaked.

Lastly, you can also do hist(log(x), ...) to get a histogram of the log of your data.

Dirk Eddelbuettel
Excellent! How can I modify the axis on the bottom though? Rather than showing 1, 2, 3, 4, 5, 6, I'd like to show 0 <= 1, 1 <= 2, etc.
Weegee
Suppressing the axis in plot() and explicit call to axis() giving the 'where' and 'what' allows you to do that.
Dirk Eddelbuettel
Thanks you. I think I've got it figured out.
Weegee
+1  A: 

Another option would be to use the ggplot2 package.

ggplot(mydata, aes(x = V3)) + geom_histogram() + scale_x_log()
Thierry
+1  A: 

It's not entirely clear from your question whether you want a logged x-axis or a logged y-axis. A logged y-axis is not a good idea when using bars because they are anchored at zero, which becomes negative infinity when logged. You can work around this problem by using a frequency polygon or density plot.

hadley