CS-456: Updating all weights (Lecture 4)

The lecture mentioned some kind of problem when all weights (in a layer?) are updated with the same delta_i. But I don't quite see what the issue is. Why do we all of a sudden need several iterations in order to go in the direction we want to. And why is this solved by having a g( a ) that allows for negative inputs?

Re: Updating all weights (Lecture 4)

by Jørn Bøni Hofstad - Saturday, 29 June 2019, 1:03 PM

Is it because the positive values forces any update to be on the form

dalta w = (+/-) [ a, b]

where a and b both are positive numbers, so that if we want to go in the direction

[1, -1]

we will ( in the best case ) need to do two steps:

+[1,0]

-[0,1]

and in most cases have to do a lot more?

Re: Updating all weights (Lecture 4)

by Alexandru Mocanu - Sunday, 30 June 2019, 7:36 PM

Yes, needing many steps to move in some direction is one of the problems.

Another one is the "bias problem". Namely, we want the weights of a neuron to be in some kind of regime (zero-mean for example) for training to go well (no vanishing gradients or staying only in the linear areas of activation functions). However, if all the weights are shifted in the same direction, the mean value of the weights of the neuron will also shift in that direction, thus moving us away from the desired training regime.