MP1 - RND Normalization

MP1 - RND Normalization

par Gaston Emil Wolfart,
Nombre de réponses : 2

Hello,

I have a question about the normalizations in part 3.4.

The project description states that the states have to be normalized using only a running average and that the intrinsic reward has to be also normalized but this time clamped between -5 and 5.

However, the linked article in the project description [Burda et al., 2018]. seems to state the opposite, namely that the intrinsic reward has to be only normalized with a running average and that it is the next_state that has to be normalized and clamped between -5 and 5.


Which one should we implement ?

Thanks for your help,

Gaston Wolfart


En réponse à Gaston Emil Wolfart

Re: MP1 - RND Normalization

par Lucas Louis Gruaz,
Hello,

Yes, you are right. This is a mistake we made, but both work fine for the given task. You can implement any of the two options, both will be considered correct.

Best,
Lucas
En réponse à Lucas Louis Gruaz

Re: MP1 - RND Normalization

par Maria Yuffa Meshcheryakova,

Dear Lucas,

I am slightly confused: aren't we considering states that are much smaller than 5 (from -1.2 to 0.6 and from -0.07 to 0.07) compared to the authors who are exploring different environments? 

Thank you in advance for clarification!

Best wishes,

Maria