reinforcement-learning

How Do I Run Sutton and Barton's "Reinforcement Learning" Lisp Code?

I have been reading a lot about Reinforcement Learning lately, and I have found "Reinforcement Learning: An Introduction" to be an excellent guide. The author's helpfully provice source code for a lot of their worked examples. Before I begin the question I should point out that my practical knowledge of lisp is minimal. I know the basic...

Good implementations of reinforced learning?

For an ai-class project I need to implement a reinforcement learning algorithm which beats a simple game of tetris. The game is written in Java and we have the source code. I know the basics of reinforcement learning theory but was wondering if anyone in the SO community had hands on experience with this type of thing. What would your ...

Generalization functions for Q-Learning

I have to do some work with Q Learning, about a guy that has to move furniture around a house (it's basically that). If the house is small enough, I can just have a matrix that represents actions/rewards, but as the house size grows bigger that will not be enough. So I have to use some kind of generalization function for it, instead. My ...

Improving Q-Learning

Hello I am currently using Q-Learning to try to teach a bot how to move in a room filled with walls/obstacles. It must start in any place in the room and get to the goal state(this might be, to the tile that has a door, for example). Currently when it wants to move to another tile, it will go to that tile, but I was thinking that in the...

What are the uses of recurrent neural networks when using them with Reinforcement Learning?

I do know that feedforward multi-layer neural networks with backprop are used with Reinforcement Learning as to help it generalize the actions our agent does. This is, if we have a big state space, we can do some actions, and they will help generalize over the whole state space. What do recurrent neural networks do, instead? To what tas...

QLearning and never-ending episodes

Let's imagine we have an (x,y) plane where a robot can move. Now we define the middle of our world as the goal state, which means that we are going to give a reward of 100 to our robot once it reaches that state. Now, let's say that there are 4 states(which I will call A,B,C,D) that can lead to the goal state. The first time we are in ...

Negative rewards in QLearning

Let's assume we're in a room where our agent can move along the xx and yy axis. At each point he can move up, down, right and left. So our state space can be defined by (x, y) and our actions at each point are given by (up, down, right, left). Let's assume that wherever our agent does an action that will make him hit a wall we will give ...

Alpha and Gamma parameters in QLearning

What difference to the algorithm does it make having a big or small gamma value? In my optic, as long as it is neither 0 or 1, it should work exactly the same. On the other side, whatever gamma I choose, it seems the Qvalues get pretty close to zero really quickly(I'm having here values on the order of 10^-300 just in a quick test). How ...

Reinforcement learning with neural networks

I am working on a project with RL & NN I need to determine the action vector structure which will be fed to a neural network.. I have 3 different actions (A & B & Nothing) each with different powers (e.g A100 A50 B100 B50) I wonder what is the best way to feed these actions to a NN in order to yield best results? 1- feed A/B to inpu...

Reinforcement learning And POMDP

I am trying to use Multi-Layer NN to implement probability function in Partially Observable Markov Process.. I thought inputs to the NN would be: current state, selected action, result state; The output is a probability in [0,1] (prob. that performing selected action on current state will lead to result state) In training, I fed the in...

Reinforcement learning toy project

My toy project to learn & apply Reinforcement Learning is: - An agent tries to reach a goal state "safely" & "quickly".... - But there are projectiles and rockets that are launched upon the agent in the way. - The agent can determine rockets position -with some noise- only if they are "near" - The agent then must learn to avoid crashing ...

Learning the Structure of a Hierarchical Reinforcement Task

I've been studying hierachial reinforcement learning problems, and while a lot of papers propose interesting ways for learning a policy, they all seem to assume they know in advance a graph structure describing the actions in the domain. For example, The MAXQ Method for Hierarchial Reinforcement Learning by Dietterich describes a complex...

Implementing HexQ Algorithm

Does anyone know if there's an open source implementation (in any language) of the HexQ algorithm for hierarchy discovery in reinforcement learning, or something like it? I'd like to evaluate it in different domains but I'm having trouble understanding how to implement it from the paper's description. ...

Are there any active reinforcement learning competitions?

Hi I like doing part-time research in reinforcement learning. In recent years (up to 2009) there was a reinforcement learning competition held at rl-competition.org with some very interesting problems, but this seems to have been discontinued. I'd love to improve my skills and knowledge and measure it against other enthusiasts in the fi...