CS-456: MP1 | Moodle

Hi,

I have a question regarding the mountain car problem , after 1000 episodes, my agent can solve the problem without any auxiliary rewards, meaning it reaches the final state every time I re-run it . When I add auxiliary rewards, it also solves the problem. However, I'm not entirely clear on what "solving the problem" means in this context. Does it mean that the agent's behavior converges to a policy that consistently achieves nearly the same reward every time it re-runs the problem? Or is it just reaching the final state?

Thanks !

Re: MP1

par Anja Surina, mardi, 21 mai 2024, 13:04

Hello!
Solving the problem in this context means obtaining a good policy that consistently reaches the final state before 200 steps. It would not achieve the same reward in all episodes because the agent is stochasticly placed in the environment. But there is a pattern emerging that we later ask you about in the document.