ansaurus

Question

Answer 1

+2 A:

Just as a refresher of terminology, in Q-learning, you are trying to learn the Q-functions, which depend on the state and action:

Q(S,A) = ????

The standard version of Q-learning as taught in most classes tells you that you for each S and A, you need to learn a separate value in a table and tells you how to perform Bellman updates in order to converge to the optimal values.

Now, lets say that instead of table you use a different function approximator. For example, lets try linear functions. Take your (S,A) pair and think of a bunch of features you can extract from them. One example of a feature is "Am I next to a wall," another is "Will the action place the object next to a wall," etc. Number these features f1(S,A), f2(S,A), ...

Now, try to learn the Q function as a linear function of those features

Q(S,A) = w1 * f1(S,A) + w2*f2(S,A) ... + wN*fN(S,A)

How should you learn the weights w? Well, since this is a homework, I'll let you think about it on your own.

However, as a hint, lets say that you have K possible states and M possible actions in each state. Lets say you define K*M features, each of which is an indicator of whether you are in a particular state and are going to take a particular action. So

Q(S,A) = w11 * (S==1 && A == 1) + w12 * (S == 1 && A == 2) + w21 * (S==2 && A==3) ...

Now, notice that for any state/action pair, only one feature will be 1 and the rest will be 0, so Q(S,A) will be equal to the corresponding w and you are essentially learning a table. So, you can think of the standard, table Q-learning as a special case of learning with these linear functions. So, think of what the normal Q-learning algorithm does, and what you should do.

Hopefully you can find a small basis of features, much fewer than K*M, that will allow you to represent your space well.

bsdfish 2009-10-13 05:48:32

Hello again. Only know I understand Q-Learning well enough to understand your answer! Is there some place I can look at that explains more carefully this your explanation? Thanks!

devoured elysium 2009-11-14 03:52:15

bsdfish 2009-11-15 21:50:02

ansaurus

tags:

views:

answers:

Generalization functions for Q-Learning

related questions