Hi everyone. I am confused about what we should fix for randomness. For question 2 in the project. I think we need to set the environment to different seeds for each evaluation episode (but the sequence of seeds is fixed), or we will get the same results. But for question 3, I have a few questions:
1. Do we need to set the environment to different seeds for each training episode? If yes, do we need to fix the randomness of the seed sequence?
2. Do we need to set the environment to different seeds for each evaluation episode? If yes, do we need to fix the randomness of the seed sequence?
3. Do we need to fix the randomness of exploration-exploitation for each training episode?