Yes, that's the idea. However, be careful as environments may have trajectories/episodes that end/reset at different timesteps and advantage computation should be robust to that.
Also you want to keep a global counter on the total environment steps used to train the agent which would be incremented according to the number of workers and steps per worker.
Also you want to keep a global counter on the total environment steps used to train the agent which would be incremented according to the number of workers and steps per worker.