views:

46

answers:

1

I have to implement the value iteration algorithm for finding the optimal policy for each state of an MDP using Bellman's equation. The input file is some thing like below: s1 0 (a1 s1 0.5) (a1 s2 0.5) (a2 s1 1.0) s2 0 (a1 s2 1.0) (a2 s1 0.5) (a2 s3 0.5) s3 10 (a1 s2 1.0) (a2 s3 0.5) (a2 s4 0.5)

where s1 is the state 0 is the reward associated with s1. Upon taking action a1, we stay in s1 with probability 0.5. Upon taking action a1, we go to s2 with probability 0.5.Upon taking action a2, we stay in s1 with probability 1.0. And similarly the others.

After reading the input file, I have to store it in some data structure. Which would be the appropriate data structure to do so in PYTHON so that traversing through it is easy.

+2  A: 
s1 0 (a1 s1 0.5) (a1 s2 0.5) (a2 s1 1.0)
s2 0 (a1 s2 1.0) (a2 s1 0.5) (a2 s3 0.5)
s3 10 (a1 s2 1.0) (a2 s3 0.5) (a2 s4 0.5)

Something like this?

data = { 's1': { 'reward': 0,
                 'action': { 'a1': { 's1': 0.5,
                                     's2': 0.5 },
                             'a2': { 's1': 1.0 }
                           },
               },
         's2': { 'reward': 0,
                 'action': { 'a1': { 's1': 1.0 },
                             'a2': { 's1': 0.5,
                                     's2': 0.5 },
                           },
               },
         's3': { 'reward': 10,
                 'action': { 'a1': { 's2': 1.0 },
                             'a2': { 's3': 0.5,
                                     's4': 0.5 },
                           }
               }
        }
eumiro