tags:

views:

209

answers:

5

When should the Monte-Carlo method be used?

For example, why did Joel decide to use the Monte-Carlo method for Evidence Based Scheduling instead of methodically processing all user data for the past year?

A: 

Wikipedia has a good article on monte carlo simulation methods. I've used monte carlo on a few occasions - in a nutshell MC methods tend to give accurate-ish answers when trying to project results using sample sets that are pretty much random and somebody would typically use intuition to try and guess at a trend. Unfortunately trying to explain MC methods is pretty tough so check out the article.

Jarrod
A: 

Because the estimates are usually pretty widely distributed when scheduling programming tasks it makes more sense to treat them statistically.

If we take a project which takes 100's of tasks the errors on the estimates will even out and you end up with a distribution which shows the likelihood of project completion as a range.

It also circumvents some serious issues like task buffering and student syndrome skewing the results even further.

Peter Tillemans
+1  A: 

Sometimes checking all the options is simply prohibitive.

Loren Pechtel
+2  A: 

Suppose that you want to estimate some quantity of interest. In the Joel's example 'ship date' is what you want to estimate. In most such situations, there are random factors that impact our estimates.

When you have a random quantity, you typically wants to know its mean and the standard deviation so that you can take appropriate actions. In simple situations, you can model the quantity as a standard distribution (e.g., normal distribution) for which analytical formulas exist for the mean and the standard deviation. However, there exist many situations where analytical formulas do not exist. In such situations, instead of an analytic solution for the mean and the standard deviation, we resort to simulation. The idea is:

Step 1: Generate factors that impact the quantity of interest using appropriate distributions

Step 2: Compute quantity of interest

Repeat steps 1 and 2 many times and compute the empirical average and standard deviation for what you want to know.

The above is by far the typical application of monte carlo application. See the wikipedia link provided by Jarrod for several such applications and some examples of interesting applications where there is no inherent randomness (e.g., estimation of pi).

Anon
I like your answer except that the steps you give are very vague. Can you make them more precise somehow?
Gili
Well, monte carlo is a vast area with lots of applications. As an example, suppose that you want have some data about various project characteristics (e.g., no of developers, target OS etc) and ship times (e.g., 3 months, 6 months etc). You may already know the relationship between project characteristics and ship times. For example, Ship Times ~ N(mu,sigma^2) I(Ship Times >0) where N(.) indicates a normal distribution, mu and sigma are function of project characteristics and I(Ship Times > 0) expresses the fact that ship times cannot be negative.
Anon
You may want to know the consequences of changing some project parameter (e.g., increase no of developers) on ship times. Unfortunately, no closed form expression exists for the mean of a truncated normal. So, what you would do is:Step 1: Generate a truncated normal using rejection sampling or inverse transform methodStep 2. Store ship time (in this case step 2 involves no computation)Repeat steps 1 and 2 N times and compute the mean and std dev of ship times that you stored in step 2.The above assumes that you know the relationship between the project parameters and mu and sigma.
Anon
If you do not know that relationship then you would of course need to model that relationship and estimate the relevant parameters. For example, you could assume that mu = beta1 * (No of developers) + beta2 * (No of meetings with clients) etc and estimate beta1, beta2 etc. Hope that helps.
Anon
A: 

Monte Carlo methods are commonly used when the dimensionality of the problem is too high for traditional schemes. A great introductory paper on the subject is Persi Diaconis' The Markov Chain Monte Carlo Revolution.

jmbr
Interesting paper, but I quickly got lost in the details.
Gili