Hello,
We would like to clarify that starting from Section 3.2 in Miniproject 2, all the agents should be trained with the stochastic rewards.
Follow the pulses of EPFL on social networks
© 2023 EPFL, all rights reserved