CS-456: ex 7.2 | Moodle 18-19

Dear TA's

I have a question regarding ex 7.2 . For me there is an ambiguity between these 2 possible choices (i.e. where should I put batch normalization)

1)based on the paper batch normalization should be implemented before activation function of each layer(after calculating w*x+b and before applying nonlinearity) .

2) but in the question it is written "(i.e. before each layer with learnable parameters)" which means after activation function of previous layer.

according to this link both are correct and can be used

https://machinelearningmastery.com/how-to-accelerate-learning-of-deep-neural-networks-with-batch-normalization/

which one we should implement in this question

Best regards,

Saleh

Re: ex 7.2

par Bernd Albert Illing, lundi 29 avril 2019, 14:19

Hi Saleh,

The original batch-norm paper (Ioffe et al 2015) introduces normalised activities (y) that are passed as input to the next layer (algorithm 2 in the paper). This means batch-norm is applied after the activation function of the previous layer, i.e. before each layer with learnable parameters.

Please implement batch norm like this (variant 2 in you post).

I hope this clarifies. Best regards,

Bernd