Clarifications on the A2C algorithm (MP2)

Re: Clarifications on the A2C algorithm (MP2)

by Skander Moalla -
Number of replies: 0

I initially understood that the data collection step (policy rollout) and the learning step (network updates) needed to be conducted sequentially.

This is correct.

So, should we not wait for the end of the episode?

But you shouldn't wait for the end of an episode. After each n steps in the environment (per environment) you do one learning step.