CS-456: DQN project: What should we fix for randomness?

Hi everyone. I am confused about what we should fix for randomness. For question 2 in the project. I think we need to set the environment to different seeds for each evaluation episode (but the sequence of seeds is fixed), or we will get the same results. But for question 3, I have a few questions:

1. Do we need to set the environment to different seeds for each training episode? If yes, do we need to fix the randomness of the seed sequence?

2. Do we need to set the environment to different seeds for each evaluation episode? If yes, do we need to fix the randomness of the seed sequence?

3. Do we need to fix the randomness of exploration-exploitation for each training episode?

Re: DQN project: What should we fix for randomness?

by Titouan Alexis Arthur Renard - Friday, 19 May 2023, 16:57

Hey there, here is what we exactly expect:

1) Training seeds: What we want the training to be reproducible, so we ask you to seed the environment in a reproducible way for each training episode, same is true for the exploration/exploitation ratio. The seed sequence should not be random. That being said since we ask you to train three times and average evaluation performance across the three training runs make sure you use different, but still deterministic seed sequences for the three traces (it can be as simple as episode_number+some_large_constant*trace_number, it just needs to be a different number for each training episode, and not be the same for the same episode across training runs).

2) Evaluation seeds: Here this is a bit different, we want to make sure we always evaluate on the same seed sequence (because we want our results to be meaningfully comparable). So when you evaluate for 50 episodes, make sure you always use the same 50 seeds for the environment.