Reinforce with Baseline

Reinforce with Baseline

par Riccardo Brioschi,
Nombre de réponses : 2

Hello,

when using the Monte Carlo update in 'Reinforce with Baseline' in order to update the V values, do we consider only the discounted reward obtained in the episode or do we consider the average of the obtained rewards over all the previous episodes?

Thanks in advance!

En réponse à Riccardo Brioschi

Re: Reinforce with Baseline

par Wulfram Gerstner,

For the V-value you should  always use a good estimation. Whether you do this in an online update (over many episodes) or in a batch update (over many episodes) does not matter. But yes, many episodes)