Hi,
Could you please explain me why the policy is set to gready at the end of an episode (last line of the algorithm) ?
Hi,
Could you please explain me why the policy is set to gready at the end of an episode (last line of the algorithm) ?
Hi,
If I see correctly the full title of this algorithm reads 'Monte Carlo ES for estimating '. Usually
is the optimal policy, i.e. the one which maximises rewards. If we want to estimate the optimal policy, the greedy policy seems to be the best pick.
However, to avoid being stuck with a suboptimal policy, some exploration is usually added to the policy, especially at the beginning of the algorithm.
Best,
Bernd
Thanks a lot.
Where did you find the whole Title ?
Best regards.
Müller Nathan
Hi,
most/all of the 'pseudo-codes' of algorithms presented in the lecture are taken from the Sutton & Barto book. In this case from the chapter on Monte Carlo methods (p 99 in my version of the book).
Sometimes the lecture slides hide the title to not confuse the students with unnecessary details. In this case it might have been counterproductive ;)
Best,
Bernd
Thank you very much, I will look in the book next time I see that a part of thr title has been hidden and I'm not sure why it does something.
Follow the pulses of EPFL on social networks
© 2023 EPFL, all rights reserved