Coding exercise 1

Coding exercise 1

by Nathan Samuel Müller -
Number of replies: 2

Hi,

I modified the template of the code that was given to us for the part 2 SARSA which is given below and similarly for part 3. The lines I added are in blue and the ones I removed are in green.

I changed the code like this because, the way I understood the algorithm, when you are in s',a' you calculate Q(s, a) (for 1-step horizon) but you will also play the action a' you chose for the calculation. Is it correct ?


n = 5
env = Maze(n)

Q_table = np.zeros((env.n_states, env.n_actions))
state = env.current_state
eta = 0.1 # learning rate
gamma = 0.9 # discount rate
eps = 0.1 # exploration rate

num_episodes = 100
rewards = np.zeros(num_episodes)

for itr in range(num_episodes):
   
    action = pick_action(state, Q_table, epsilon=eps) #added
   
    while not env.done:
        state_ind = cell_to_ind(state, n)
        #action = pick_action(state, Q_table, epsilon=eps)  removed
        next_state, reward, done = env.step(action)
        next_state_ind = cell_to_ind(next_state, n)
        next_action = pick_action(next_state, Q_table, epsilon=eps) #added
        ######## code here #############################
        # update Q-table using the iterative update rule
       
        Q_table[state_ind, action] += eta*(reward+gamma*Q_table[next_state_ind, next_action]- \
                                           Q_table[state_ind, action])
   
        ######## code here ############################
        rewards[itr] += reward

        state = next_state
        action = next_action #added
   
    env.reset()
    state = env.current_state
       
print(calc_V(Q_table, epsilon=eps))
plt.plot(rewards)


Best regards.

Müller Nathan

In reply to Nathan Samuel Müller

Re: Coding exercise 1

by Berfin Simsek -

Hello Nathan,

Thanks a lot for pointing this out! We updated the code accordingly, you'll get an announcement on this in a minute. Also, you were supposed to set eligibility traces to 0 after each episode (in part 3), now we added that part too. 

Best,

Berfin