Hi,
I have questions about the RND part of the DQN.
1. I have trouble normalizing my states and rewards correctly. How can I compute a running estimate of the standard deviation?
2. It's said that we need to clamp the reward between -5 and 5. Further it's written that the auxiliary reward must be balanced by a factor. Should we multiply by the factor before or after the clamp?
Thibault Schiesser