REINFORCE with baseline algorithm

REINFORCE with baseline algorithm

par Wanhao Zhou,
Number of replies: 1

Dear TAs,

I have a question regarding the "REINFORCE with baseline". It seems that in the lecture when updating the parameter for the value network (w), there is a discount factor (\gamma^t) included in the monomial which does not make sense to me. I checked various literature incl. the 2018 edition of RL book, and they neglected the discount factor when estimating the value network. In the homework implementation we also simply used the MSE which neglected the factor. Are both versions ok or which one makes more sense?


Thanks,

Wanhao 

In reply to Wanhao Zhou

Re: REINFORCE with baseline algorithm

par Nicolas El Maalouly,

Both are correct since this only affects the learning rate. It's true that for the value network it's probably better to remove it. In the Sutton and Barton book they had it in older versions, but removed it in the later ones.