For the n-step A2C part of the project it says:
I am not sure if I am understanding correctly here. When we did n-step SARSA we had to do a "look ahead" of n-steps, that is to update Q(s_0, a_0) we had to to step all the way s_n, and to calculate Q(s_1, a_1) we had to to step all the way s_{n+1}, on the same trajectory.
Isn't that the exact same thing we want to do here but calculating advantages instead? I think it would be beneficial to see the a2c n-step algorithm written out (currently in lecture 8 we are only provided with the algorithm for n=1).