MP1

Re: MP1

par Anja Surina,
Nombre de réponses : 0
Hello!
Solving the problem in this context means obtaining a good policy that consistently reaches the final state before 200 steps. It would not achieve the same reward in all episodes because the agent is stochasticly placed in the environment. But there is a pattern emerging that we later ask you about in the document.