CS-456: Monte Carlo

Hi,

Could you please explain me why the policy is set to gready at the end of an episode (last line of the algorithm) ?

Image from lecture 3, Slides 45.

Thank you.

Müller Nathan

Re: Monte Carlo

by Bernd Albert Illing - Monday, 16 March 2020, 4:26 PM

Hi,

If I see correctly the full title of this algorithm reads 'Monte Carlo ES for estimating $\pi \approx \pi_{*}$ '. Usually $\pi_{*}$ is the optimal policy, i.e. the one which maximises rewards. If we want to estimate the optimal policy, the greedy policy seems to be the best pick.

However, to avoid being stuck with a suboptimal policy, some exploration is usually added to the policy, especially at the beginning of the algorithm.

Best,

Bernd

Re: Monte Carlo

by Nathan Samuel Müller - Monday, 16 March 2020, 6:33 PM

Thanks a lot.

Where did you find the whole Title ?

Best regards.

Müller Nathan

Re: Monte Carlo

by Bernd Albert Illing - Monday, 16 March 2020, 7:21 PM

Hi,

most/all of the 'pseudo-codes' of algorithms presented in the lecture are taken from the Sutton & Barto book. In this case from the chapter on Monte Carlo methods (p 99 in my version of the book).

Sometimes the lecture slides hide the title to not confuse the students with unnecessary details. In this case it might have been counterproductive ;)

Best,

Bernd

Re: Monte Carlo

by Nathan Samuel Müller - Tuesday, 17 March 2020, 11:51 AM

Thank you very much, I will look in the book next time I see that a part of thr title has been hidden and I'm not sure why it does something.