CS-456: balancing the rewards

Hello!
Regarding the first question, yes, you are correct, the environment reward is -1 penalty you get at every step from the environment.
Regarding your second question, I am not sure if there is a plot attached to your question, at least I am not able to see it. You can present two sets of experiments - one just with your auxiliary reward function and one with the combination of both and argue that only using auxiliary reward is better.

ANN Forum

balancing the rewards

Re: balancing the rewards