Hello!
Regarding the first question, yes, you are correct, the environment reward is -1 penalty you get at every step from the environment.
Regarding your second question, I am not sure if there is a plot attached to your question, at least I am not able to see it. You can present two sets of experiments - one just with your auxiliary reward function and one with the combination of both and argue that only using auxiliary reward is better.
Regarding the first question, yes, you are correct, the environment reward is -1 penalty you get at every step from the environment.
Regarding your second question, I am not sure if there is a plot attached to your question, at least I am not able to see it. You can present two sets of experiments - one just with your auxiliary reward function and one with the combination of both and argue that only using auxiliary reward is better.