We have had a production web based product that allows users to make predictions about the future value (or demand) of goods, the historical data contains about 100k examples, each example has about 5 parameters;
Consider a class of data called a prediciton:
prediction {
id: int
predictor: int
predictionDate: date
predictedProductId: int
predictedDirection: byte (0 for decrease, 1 for increase)
valueAtPrediciton: float
}
and a paired result class that measures the result of the prediction:
predictionResult {
id: int
valueTenDaysAfterPrediction: float
valueTwentyDaysAfterPrediction: float
valueThirtyDaysAfterPrediction: float
}
we can define a test case such for success, where if any two of the future value check points are favorable when conisdering direction and value at the time of prediction.
success(p: prediction, r: predictionResult): bool =
count: int
count = 0
// value is predicted to fall
if p.predictedDirection = 0 then
if p.valueAtPrediciton > r.valueTenDaysAfterPrediction then count = count + 1
if p.valueAtPrediciton > r.valueTwentyDaysAfterPrediction then count = count + 1
if p.valueAtPrediciton > r.valueThirtyDaysAfterPrediction then count = count + 1
// value is predicted to increase
else
if p.valueAtPrediciton < r.valueTenDaysAfterPrediction then count = count + 1
if p.valueAtPrediciton < r.valueTwentyDaysAfterPrediction then count = count + 1
if p.valueAtPrediciton < r.valueThirtyDaysAfterPrediction then count = count + 1
// success if count = 2 or count = 3
return (count > 1)
Everything in the prediction class is known the moment the user submits the form, and the information in the predictionResult is not known until later; Ideally the model or algorythm can be derived from our three year history that algorythm is applied to a new prediciton we can get a probability as to whether it will be a success or not (I would be happy with a boolean Y/N flag as to wether this is interesting or not).
I don't know much about machine learning, and I am trying to make my way through material. But it would be great if I could have some guidance so I can research and practice exactly what I need to solve a problem like this.
Thank you