CS-456: lecture about actor critic

lecture about actor critic

◄ Lecture 9, Horizon-T value function notation
Are we allowed to use calculators in the final exam? ►

Dear Prof and TAs:

I wanna to ask about the second true or false question: why under the condition that δ is rt+1 we get the reinforce without baseline but not r(t+1) + γ V(st+1) since in the policy gradient the G = the cumulative reward with discount factor?

Thank you very much!

◄ Lecture 9, Horizon-T value function notation
Are we allowed to use calculators in the final exam? ►

Contact
EPFL CH-1015 Lausanne
+41 21 693 11 11

Follow the pulses of EPFL on social networks

Accessibility
Legal notice
Privacy policy