CS-456: MP1 | Moodle

MP1

◄ balancing the rewards
MP1 Project report ►

Hello!
Solving the problem in this context means obtaining a good policy that consistently reaches the final state before 200 steps. It would not achieve the same reward in all episodes because the agent is stochasticly placed in the environment. But there is a pattern emerging that we later ask you about in the document.

◄ balancing the rewards
MP1 Project report ►

Contact
EPFL CH-1015 Lausanne
+41 21 693 11 11

Suivre les pulsations de l'EPFL sur les réseaux sociaux

Accessibilité
Mentions légales
Protection des données

ANN Forum

MP1

Re: MP1