CS-456: MP2 - A2C Loss plots

Hello, it's all correct. Your critic is fitting a moving target as your policy gets better and better. It is only expected to converge when the policy has converged. So it's normal to see spikes while the agent is still training.

For the project, in Agent 1 without stochasticity we expect your agent to converge quickly and thus to observe your critic converge too (value loss less than 1e-4). You will likely have to adapt the scale of your y-axis to observe it.

ANN Forum

MP2 - A2C Loss plots

Re: MP2 - A2C Loss plots