lecture about actor critic

lecture about actor critic

by Junye Du -
Number of replies: 0

Dear Prof and TAs:

I wanna to ask about the second true or false question: why under the condition that δ is rt+1 we get the  reinforce without baseline  but not r(t+1) + γ V(st+1) since in the policy gradient the G  = the cumulative reward with discount factor?

Thank you very much!