CS-233(b): Zero-out the accumulated gradients

Hi,

That is mostly a specificity of the PyTorch framework: the gradients computed at one iteration are not automatically discarded after being used for gradient descent, so we have to "zero them out" by hand.
If we don't, the gradients computed at the next iteration will be added to the previous ones instead of replacing them.

It is simply a way of telling PyTorch we don't care anymore about the computed gradients, so they can be discarded.

Discussion Forum

Zero-out the accumulated gradients

Re: Zero-out the accumulated gradients