Dear TAs,
I have a question regarding the "REINFORCE with baseline". It seems that in the lecture when updating the parameter for the value network (w), there is a discount factor (\gamma^t) included in the monomial which does not make sense to me. I checked various literature incl. the 2018 edition of RL book, and they neglected the discount factor when estimating the value network. In the homework implementation we also simply used the MSE which neglected the factor. Are both versions ok or which one makes more sense?
Thanks,
Wanhao