CS-233(b): Zero-out the accumulated gradients

Hello,

In the exercise session, one of the key steps of the CNN algorithms is "zeroing-out".

I couldn't find any reference to this term in the lecture. What is the purpose of it?

Thank you

Re: Zero-out the accumulated gradients

par Nicolas Talabot, dimanche, 22 mai 2022, 17:50

Hi,

That is mostly a specificity of the PyTorch framework: the gradients computed at one iteration are not automatically discarded after being used for gradient descent, so we have to "zero them out" by hand.
If we don't, the gradients computed at the next iteration will be added to the previous ones instead of replacing them.

It is simply a way of telling PyTorch we don't care anymore about the computed gradients, so they can be discarded.