REINFORCE with baseline algorithm

Re: REINFORCE with baseline algorithm

by Nicolas El Maalouly -
Number of replies: 0

Both are correct since this only affects the learning rate. It's true that for the value network it's probably better to remove it. In the Sutton and Barton book they had it in older versions, but removed it in the later ones.