CS-456: Clarifications on the A2C algorithm (MP2) | Moodle

Clarifications on the A2C algorithm (MP2)

I initially understood that the data collection step (policy rollout) and the learning step (network updates) needed to be conducted sequentially.

This is correct.

So, should we not wait for the end of the episode?

But you shouldn't wait for the end of an episode. After each n steps in the environment (per environment) you do one learning step.

Contact
EPFL CH-1015 Lausanne
+41 21 693 11 11

Follow the pulses of EPFL on social networks

© 2023 EPFL, all rights reserved