Hello,
I have a question about the normalizations in part 3.4.
The project description states that the states have to be normalized using only a running average and that the intrinsic reward has to be also normalized but this time clamped between -5 and 5.
However, the linked article in the project description [Burda et al., 2018]. seems to state the opposite, namely that the intrinsic reward has to be only normalized with a running average and that it is the next_state that has to be normalized and clamped between -5 and 5.
Which one should we implement ?
Thanks for your help,
Gaston Wolfart