CS-456: week7/ex3 - SARSA algorithm

Hello,
Would it be possible to have some more explanation on this exercise.
Specifically:
1. Why is the update made with Q(s6,a2) in question a.
2. Why does the first update in the third trial happens for Q(s,a) ? And why is the update made with Q(s4,a1) ? in question b.

Thank you,

Virginie

Re: week7/ex3 - SARSA algorithm

by Nicolas El Maalouly - Thursday, 27 June 2019, 11:15 AM

1) non zero reward only occurs after state s5 so it gets the first update. In s6 the only possible action is up which is a2, so the next Q value after state s5 is Q(s6,a2).

2) this should be s3 instead of s (just like in the second trial we had s4).

Re: week7/ex3 - SARSA algorithm

by Virginie Piskin - Sunday, 30 June 2019, 7:13 PM

Thank you !