views:

409

answers:

4

Hello

I have two histograms.

int Hist1[10] = {1,4,3,5,2,5,4,6,3,2};

int Hist1[10] = {1,4,3,15,12,15,4,6,3,2};

Hist1's distribution is of type multi-modal;

Hist2's distribution is of type uni-modal with single prominent peak.

My questions are

  1. Is there any way that i could determine the type of distribution programmatically?
  2. How to quantify whether these two histograms are similar/dissimilar?

Thanks

+1  A: 

These are just guesses, but I would try fitting each distribution as a gaussian distribution and use something like the R-squared value to determine if the distribution is uni-modal or not.

As to the similarity between the two distributions, I would try doing an autocorrelation and using the peak positive value in the autocorrelation as a similarity measure. These ideas are pretty rough, but hopefully they give you some ideas.

Justin Peel
A: 

Comparison of Histograms (For Use in Cloud Modeling).

(That's an MS .doc file.)

+1  A: 

For #2, you could calculate their cross-correlation (so long as the buckets themselves can be sorted). That would give you a rough estimation of what "similarity".

Frank Krueger
A: 

There are a variety of software packages that will "fit" your distributions to known discrete distributions for you - Minitab, STATA, R, etc. A reference to fitting distributions in R is here. I wouldn't advise programming this from scratch.

Regarding distribution comparisons, if neither distribution fits a known distribution (Poisson, Binomial, etc.), then you need to use non-parametric methods described here.

Grembo