Monte Carlo

Monte Carlo

by Nathan Samuel Müller -
Number of replies: 4

Hi,

Could you please explain me why the policy is set to gready at the end of an episode (last line of the algorithm) ?

Image from lecture 3, Slides 45.

Thank you.

Müller Nathan

In reply to Nathan Samuel Müller

Re: Monte Carlo

by Bernd Albert Illing -

Hi,

If I see correctly the full title of this algorithm reads 'Monte Carlo ES for estimating  \pi \approx \pi_{*} '. Usually  \pi_{*} is the optimal policy, i.e. the one which maximises rewards. If we want to estimate the optimal policy, the greedy policy seems to be the best pick.

However, to avoid being stuck with a suboptimal policy, some exploration is usually added to the policy, especially at the beginning of the algorithm.

Best,

Bernd 

In reply to Bernd Albert Illing

Re: Monte Carlo

by Nathan Samuel Müller -

Thanks a lot.

Where did you find the whole Title ?

Best regards.

Müller Nathan

In reply to Nathan Samuel Müller

Re: Monte Carlo

by Bernd Albert Illing -

Hi, 

most/all of the 'pseudo-codes' of algorithms presented in the lecture are taken from the Sutton & Barto book. In this case from the chapter on Monte Carlo methods (p 99 in my version of the book). 

Sometimes the lecture slides hide the title to not confuse the students with unnecessary details. In this case it might have been counterproductive ;)

Best,

Bernd 

In reply to Bernd Albert Illing

Re: Monte Carlo

by Nathan Samuel Müller -

Thank you very much, I will look in the book next time I see that a part of thr title has been hidden and I'm not sure why it does something.