CS-456: Reinforce with Baseline

Hello,

when using the Monte Carlo update in 'Reinforce with Baseline' in order to update the V values, do we consider only the discounted reward obtained in the episode or do we consider the average of the obtained rewards over all the previous episodes?

Thanks in advance!

Re: Reinforce with Baseline

by Wulfram Gerstner - Tuesday, 4 April 2023, 15:00

For the V-value you should always use a good estimation. Whether you do this in an online update (over many episodes) or in a batch update (over many episodes) does not matter. But yes, many episodes)

Re: Reinforce with Baseline

by Riccardo Brioschi - Tuesday, 4 April 2023, 17:19

Thank you so much. It is clear now!