(Project) n-step A2C

Re: (Project) n-step A2C

par Skander Moalla,
Nombre de réponses : 0
Yes, that's the idea. However, be careful as environments may have trajectories/episodes that end/reset at different timesteps and advantage computation should be robust to that.
Also you want to keep a global counter on the total environment steps used to train the agent which would be incremented according to the number of workers and steps per worker.