CS-456: MP1 | Moodle

MP1

◄ balancing the rewards
MP1 Project report ►

Hello!
Solving the problem in this context means obtaining a good policy that consistently reaches the final state before 200 steps. It would not achieve the same reward in all episodes because the agent is stochasticly placed in the environment. But there is a pattern emerging that we later ask you about in the document.

◄ balancing the rewards
MP1 Project report ►

Contact
EPFL CH-1015 Lausanne
+41 21 693 11 11

Follow the pulses of EPFL on social networks

Accessibility
Legal notice
Privacy policy

ANN Forum

MP1

Re: MP1