In the solutions of exercise 5 3.c the answer is given as following:
When a=a', should we not have (1-gamma) as a coefficient? I don't understand how we are eliminating gamma when a=a'.
In the solutions of exercise 5 3.c the answer is given as following:
When a=a', should we not have (1-gamma) as a coefficient? I don't understand how we are eliminating gamma when a=a'.
fixed and independent of the weights. Therefore when deriving delta_t, \gamma multiplies the derivative of Q(s', a') which is 0 (shown in the first equation of the answer). Therefore this term disappear and only the derivative of Q(s, a) remains.
Note that gamma still implicitly appears in delta_t.
Follow the pulses of EPFL on social networks
© 2023 EPFL, all rights reserved