ex 7.2

ex 7.2

par Saleh Gholam Zadeh,
Number of replies: 1

Dear TA's

I have a question regarding ex 7.2  . For me there is an ambiguity between these 2 possible choices (i.e. where should I  put batch normalization)

1)based on the paper  batch normalization should be implemented before activation function of each layer(after calculating w*x+b and before applying nonlinearity) .

2) but in the question it is written "(i.e. before each layer with learnable parameters)"  which means after activation function of previous layer. 

according to this link both are correct and can be used 

https://machinelearningmastery.com/how-to-accelerate-learning-of-deep-neural-networks-with-batch-normalization/

which one we should implement in this question

Best regards,

Saleh

In reply to Saleh Gholam Zadeh

Re: ex 7.2

par Bernd Albert Illing,

Hi Saleh,

The original batch-norm paper (Ioffe et al 2015) introduces normalised activities (y) that are passed as input to the next layer (algorithm 2 in the paper). This means batch-norm is applied after the activation function of the previous layer, i.e. before each layer with learnable parameters.

Please implement batch norm like this (variant 2 in you post).

I hope this clarifies. Best regards,

Bernd