Model-based RL criteria for exploration term

Model-based RL criteria for exploration term

par Maria Yuffa Meshcheryakova,
Nombre de réponses : 1

Dear TAs,

I am looking at the explanation to slide 18 in model-based RL lecture (week 9, slide 18 depicts the AlphaZero algorithm I am providing the excerpt I am confused about) and there is a reference to the criteria for the exploration terms (highlighted in purple), however, I wasn't able to find it on slide 12. Could you please either reiterate the criteria or direct me to the slide where it is present?

Explanation to the choice of exploration term mentioned, highlighted in pink. Explanation slide to slide 18

Thank you in advance!

Best wishes,

Maria

En réponse à Maria Yuffa Meshcheryakova

Re: Model-based RL criteria for exploration term

par Ariane Delrocq,
The criteria is actually discussed in the notes for slide 13:

The exploration term should be such that it decreases withN(s,a)(actions with lowN(s,a)should be explored) and increases slowly withN(s)(ifsis visited often, we want to be really sure thatnone of the less taken actions would in fact be optimal; an increase inN(s)drives occasional re-exploration).