week7/ex3 - SARSA algorithm

week7/ex3 - SARSA algorithm

by Virginie Piskin -
Number of replies: 2
Hello, 
Would it be possible to have some more explanation on this exercise.
Specifically:
1. Why is the update made with Q(s6,a2) in question a.
2. Why does the first update in the third trial happens for Q(s,a) ? And why is the update made with Q(s4,a1) ? in question b.

Thank you, 

Virginie 
Attachment ann_question.PNG
Tags:
In reply to Virginie Piskin

Re: week7/ex3 - SARSA algorithm

by Nicolas El Maalouly -

1) non zero reward only occurs after state s5 so it gets the first update. In s6 the only possible action is up which is a2, so the next Q value after state s5 is Q(s6,a2).

2) this should be s3 instead of s (just like in the second trial we had s4).