Update Rule in Temporal difference | ansaurus

tags:

views:

54

answers:

1

+1 Q:

Update Rule in Temporal difference

The update rule TD(0) Q-Learning:

Q(t-1) = (1-alpha) * Q(t-1) + (alpha) * (Reward(t-1) + gamma* Max( Q(t) ) )
Then take either the current best action (to optimize) or a random action (to explorer)

Where MaxNextQ is the maximum Q that can be got in the next state...

But in TD(1) I think update rule will be:

Q(t-2) = (1-alpha) * Q(t-2) + (alpha) * (Reward(t-2) + gamma * Reward(t-1) + gamma * gamma * Max( Q(t) ) )

My question:
The term gamma * Reward(t-1) means that I will always take my best action at t-1 .. which I think will prevent exploring..
Can someone give me a hint?

Thanks

+1 A:

You are talking about "eligibility traces" usage, right? See the equations and the algorithm.

Notice the e_t(s, a) equation there. No penalty is applied when using an exploration step.

Ivo Danihelka 2010-05-29 18:20:39

related questions

Which language should I use?

What are some practical applications for a single layer perceptron?

What are some good resources for programming Artificial Intelligence?

What are known uses of AI in web development?

How can I check if one game object can see another?

Machine Learning, AI, and Soft Computing

What's a good project for an introduction to A.I.?

Evolutionary Algorithms: Optimal Repopulation Breakdowns

Why is Lisp used for AI?

Any business examples of using Markov chains?

How to create a new type of entity in Microsoft Robotics Studio 2.0?

Neural Network example in .NET

How do you solve the 15-puzzle with A-Star or Dijkstra's Algorithm?

How do I implement an A* pathfinding algorithm, with movement costs for every programming language?

What is a good programming language for AI?

What are some games with fairly simple heuristics to evaluate positions?

How to program simple chat bot AI?

Rss feed for game programmer?

Simple AI Programming

How to traverse a maze programatically when you've hit a dead end.

Incomplete information card game

Best programming based games

Recommendations needed for good AI references

Locating Text within image

Bayesian filtering for spam