views:

41

answers:

3

I'm trying to create a simple STRIPS-based planner. I've completed the basic functionality to compute separate probabilistic plans that will reach a goal, but now I'm trying to determine how to aggregate these plans based on their initial action, to determine what the "overall" best action is at time t0.

Consider the following example. Utility, bounded between 0 and 1, represents how well the plan accomplishes the goal. CF, also bounded between 0 and 1, represents the certainty-factor, or the probability that performing the plan will result in the given utility.

Plan1: CF=0.01, Utility=0.7
Plan2: CF=0.002, Utility=0.9
Plan3: CF=0.03, Utility=0.03

If all three plans, which are mutually exclusive, start with the action A1, how should I aggregate them to determine the overall "fitness" for using action A1? My first thought is to sum the certainty-factors, and multiple that by the average of the utilities. Does that seem correct?

So my current result would look like:

fitness(A1) = (0.01 + 0.002 + 0.03) * (0.7 + 0.9 + 0.03)/3. = 0.02282

Or should I calculate the individual likely utilities, and average those?

fitness(A1) = (0.01*0.7 + 0.002*0.9 + 0.03*0.03)/3. = 0.00323

Is there a more theoretically sound way?

+1  A: 

I think that the fitness function you are talking about would have to also consider all the plans that don't have A1 as the first action. They could be all be really good, in which case doing A1 is a bad idea, or they might be terrible in which case doing A1 looks like a good move.

Looking at your ideas, the second one makes much more sense to me. It calculates the expected utility of picking a plan uniformly at random from all the plans that start with A1. This is under the assumption that a plan either achieves the given utility or fails completely. For example, the first plan gets utility=0.01 with probability 0.7 and gets utility=0 with probability 0.3. This seems like a reasonable assumption; it's all you can do unless you have more data to work with.

So here's my suggestion: Let A1 be all plans starting with A1 and ~A1 be all plans not-starting with A1. Then

F(A1) = fitness(A1) / fitness(~A1)

where fitness is as you defined it in the second example.

This should give you a ratio of expected utility for plans starting with A1 against ones that don't. If it's greater than one, A1 looks like a good action.

StompChicken
+1  A: 

If you take action A1, then you have to decide which of the 3 plans to follow, which are mutually exclusive. At that point we can calculate that the expected utility of plan 1 is

E[plan1] = Prob[plan1 succeeds]*utility-for-success 
           + Prob[plan1 fails]*utility-of-failure
         = .01*.7 + .99*0 //I assume 0
         = .007

Similarly for the other 2 plans. But, since you can only choose one plan, the real expected utility (which I think is what you mean by "fitness") from taking action A1 is

max(E[plan1],E[plan2],E[plan3]) = fitness(A1)
Jose M Vidal
A: 

If you're interested in probabilistic planning, you should have a look at the POMDP model and algorithms like value iteration.

Edit:

Actually, I should have pointed you to Markov Decision Process (without the PO). I'm sorry.

What you should probably do for your problem is to maximize the expected utility. Call the fitness this.

ziggystar
Reading through the article, I take your answer to mean: fitness(A1) = max(0.01*0.7,0.002*0.9,0.03*0.03) = 0.0070
Chris S