CS-456: Mountain car mini project

Hi,

I have questions about the RND part of the DQN.

1. I have trouble normalizing my states and rewards correctly. How can I compute a running estimate of the standard deviation?

2. It's said that we need to clamp the reward between -5 and 5. Further it's written that the auxiliary reward must be balanced by a factor. Should we multiply by the factor before or after the clamp?

Thibault Schiesser

Re: Mountain car mini project

par Lucas Louis Gruaz, lundi, 27 mai 2024, 09:17

1. You can either store your states at each step in a FIFO queue, and compute the mean and std on the batch, or compute it online with a formula like new_average = old_average * (n-1)/n + new_value /n (and a similar formula for the variance).
2. You should multiply after the clamp.