views:

84

answers:

2

I have a database with many CVs, including structured data of the gender, age, address, number of years of education, and many other parameters of each person.

For about 10% of the sample, I also have additional data about a certain action they've made at some point in time. For instance, that Jane took a home loan in July 1998 or that John started pilot training in Jan. 2007 and got his license in Dec. 2007.

I need an algorithm that will give, for each of the actions, the probability that it will happen for each person in future time increments. For instance, that the chance of Bill taking a home loan is 2% in 2011, 3.5% in 2012, etc.

How should I approach this? Regression analysis? SVM? Neural net? Something else?

Is there perhaps even some standard tool/library that I can use with just the obvious customizations?

+1  A: 

The probability that X happens given that Y happened is right out of Bayesian inference, I think.

Lou Franco
As I understood the question there is no Y. There is only question about chance of X happens.
Marek
+1  A: 

Lou is right, this is the case for 'Bayesian Inference'.

The best tool/library to solve this is the R statistic programming language (r-project.org).

Take a look at the Bayesian Inference Libraries in R: http://cran.r-project.org/web/views/Bayesian.html

How many people are in the "10% of the sample"? If it's below 100 people or so, I would fear that the results of the analysis could not be significant. If it's 1000 or more people, the results will be quite good (rule of thumb).

I would fist export the data to R (r-project) and do some data cleaning necessary. Then find a person familiar with R and advanced statistics, he will be able to solve this very quickly. Or try yourself, but R takes some time in the beginning.

ps: please vote this 1 up, I just need 2 more points to be able to post comments ;-)

mrsteve