MP2- Detail about the a2c algorithm

Re: MP2- Detail about the a2c algorithm

par Skander Moalla,
Nombre de réponses : 0
Yes, you should take the mean over the K*n collected steps. You should always collect K*n steps and use them for learning, even when there is a worker who has observed an episode termination in the middle.

With a fixed number of steps, it doesn't matter if you take the mean or the sum, the constant factor is absorbed in the learning rate.
By taking the mean, you can vary K*n and keep the same learning rate to compare more accurately.