CS-456: MP2- Detail about the a2c algorithm

In the A2C algorithm from the lecture notes we find:

We implemented this, but instead of just taking the raw sum, we take the mean, dividing by K*n (might be less than this since each worker might do less than n-steps). This seems more natural to us, is it fine that we do that?

Re: MP2- Detail about the a2c algorithm

by Skander Moalla - Monday, 13 May 2024, 10:52

Yes, you should take the mean over the K*n collected steps. You should always collect K*n steps and use them for learning, even when there is a worker who has observed an episode termination in the middle.

With a fixed number of steps, it doesn't matter if you take the mean or the sum, the constant factor is absorbed in the learning rate.
By taking the mean, you can vary K*n and keep the same learning rate to compare more accurately.