Reinforce with Baseline

Reinforce with Baseline

by Riccardo Brioschi -
Number of replies: 2

Hello,

when using the Monte Carlo update in 'Reinforce with Baseline' in order to update the V values, do we consider only the discounted reward obtained in the episode or do we consider the average of the obtained rewards over all the previous episodes?

Thanks in advance!

In reply to Riccardo Brioschi

Re: Reinforce with Baseline

by Wulfram Gerstner -

For the V-value you should  always use a good estimation. Whether you do this in an online update (over many episodes) or in a batch update (over many episodes) does not matter. But yes, many episodes)