CS-456: Set 1, Ex 4

I wonder why in the update equation there is no 1/P term, where P is the number of patterns. It was included in Ex 1, but omitted in Ex 4.

Re: Set 1, Ex 4

by Bernd Albert Illing - Monday, 16 March 2020, 10:24 AM

Hi Artur,

The 1/P term in Ex 1 is only there for normalisation of the cost function, i.e. to calculate the mean squared error instead of the summed squared error over patterns. As you can see by calculating the gradient this doesn't influence the 'direction' of the gradient, just the magnitude.

The direction of the update is given by the (negative) gradient. However, the magnitude of the update is anyway arbitrary and controlled by the learning rate $\eta$ . Thus the factor 1/P in the cost function is somewhat redundant for the learning rule and can be absorbed into $\eta$ .

Hope that helps,

Bernd