Hello, regarding the RND part, it is asked to us to balance between the environment reward and the end reward we got from our two networks using a reward factor. First, I am not sure what do you mean by environment reward, is it the -1 penalty we always get from the environment ? Second, I achieve a really good performances without taking into account the -1's as you can see from the plot below. Should I still implement the combination of the two rewards ?
Hello!
Regarding the first question, yes, you are correct, the environment reward is -1 penalty you get at every step from the environment.
Regarding your second question, I am not sure if there is a plot attached to your question, at least I am not able to see it. You can present two sets of experiments - one just with your auxiliary reward function and one with the combination of both and argue that only using auxiliary reward is better.
Regarding the first question, yes, you are correct, the environment reward is -1 penalty you get at every step from the environment.
Regarding your second question, I am not sure if there is a plot attached to your question, at least I am not able to see it. You can present two sets of experiments - one just with your auxiliary reward function and one with the combination of both and argue that only using auxiliary reward is better.