CS-456: balancing the rewards

Hello, regarding the RND part, it is asked to us to balance between the environment reward and the end reward we got from our two networks using a reward factor. First, I am not sure what do you mean by environment reward, is it the -1 penalty we always get from the environment ? Second, I achieve a really good performances without taking into account the -1's as you can see from the plot below. Should I still implement the combination of the two rewards ?

Re: balancing the rewards

by Anja Surina - Tuesday, 21 May 2024, 12:05

Hello!
Regarding the first question, yes, you are correct, the environment reward is -1 penalty you get at every step from the environment.
Regarding your second question, I am not sure if there is a plot attached to your question, at least I am not able to see it. You can present two sets of experiments - one just with your auxiliary reward function and one with the combination of both and argue that only using auxiliary reward is better.