views:

2770

answers:

3

Is there a function in R that fits a curve to a histogram?

Let's say you had the following histogram

hist(c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4)))

It looks normal, but it's skewed. I want to fit a normal curve that is skewed to wrap around this histogram.

This question is rather basic, but I can't seem to find the answer for R on the internet.

+8  A: 

If I understand your question correctly, then you probably want a density estimate along with the histogram:

X <- c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4))
hist(X, prob=TRUE)            # prob=TRUE for probabilities not counts
lines(density(X))             # add a density estimate with defaults
lines(density(X, adjust=2), lty="dotted")   # add another "smoother" density
Dirk Eddelbuettel
+2  A: 

Here's the way I do it:

foo <- rnorm(100,mean=1,sd=2)
hist(foo,prob=TRUE)
curve(dnorm(x,mean=mean(foo),sd=sd(foo),add=TRUE)

A bonus exercise is to do this with ggplot2 package ...

John Johnson
However, if you want something that is skewed, you can either do the density example from above, transform your data (e.g. foo.log <- log(foo) and try the above), or try fitting a skewed distribution, such as the gamma or lognormal (lognormal is equivalent to taking the log and fitting a normal, btw).
John Johnson
But that still requires estimating the parameters of your distribution first.
Dirk Eddelbuettel
This gets a bit far afield from simply discussing R, as we are getting more into theoretical statistics, but you might try this link for the Gamma: http://en.wikipedia.org/wiki/Gamma_distribution#Parameter_estimationFor lognormal, just take the log (assuming all data is positive) and work with log-transformed data. For anything fancier, I think you would have to work with a statistics textbook.
John Johnson
I think you misunderstand how both the original poster as well as all other answers are quite content to use non-parametric estimates -- like an old-school histogram or a somewhat more modern data-driven densisty estimate. Parametric estimates are great if you have good reason to suspect a distribution. But that was not the case here.
Dirk Eddelbuettel
+2  A: 

Such thing is easy with ggplot2

library(ggplot2)
dataset <- data.frame(X = c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4)))
ggplot(dataset, aes(x = X)) + geom_histogram(aes(y = ..density..)) + geom_density()

or to mimic the result from Dirk's solution

ggplot(dataset, aes(x = X)) + geom_histogram(aes(y = ..density..), binwidth = 5) + geom_density()
Thierry