Hi,
I have a question regarding the mountain car problem , after 1000 episodes, my agent can solve the problem without any auxiliary rewards, meaning it reaches the final state every time I re-run it . When I add auxiliary rewards, it also solves the problem. However, I'm not entirely clear on what "solving the problem" means in this context. Does it mean that the agent's behavior converges to a policy that consistently achieves nearly the same reward every time it re-runs the problem? Or is it just reaching the final state?
Thanks !