MP2 clarification on return logging

MP2 clarification on return logging

par Skander Moalla,
Nombre de réponses : 0

Hello,

Please find clarifications regarding logging for Project 2.

1. Most metrics like loss and entropy can be computed on every collected batch of data n*K. These can be logged at regular intervals without any issues. The project says after every 1k steps. 1k here refers to the first multiple of n*K that's larger than or equal to 1000.

2. Returns can only be computed when episodes finish and may not be available after 1k steps. As the project description says you should:
a. For a single run, log the returns as soon as an episode finishes (independently of the worker) to see the latest evolution. This would not be in regular intervals. (for episodes from multiple workers that finish at the exact same time you just average).
b. When
aggregating multiple seeds, however, you should use the regular 1k timestep to aggregate the returns from the different seeds. You can do this in multiple ways but you should report it carefully in your report. E.g. you can take the latest available return in each seed before each 1k mark and aggregate those (min, max, avg) (recommended); for some seeds, the latest return can be way before 1k steps.
You can also take all the returns recorded between two 1k marks and aggregate all of those; some seeds may not have returns in between the two marks; it may not make sense if only one seed has returns in a window to show min=max=avg in the shaded area.