[Spoiler] Mock 2022 1.5.4 Neural Networks

[Spoiler] Mock 2022 1.5.4 Neural Networks

by Sara Jun -
Number of replies: 2

Hello,

Can someone explain why the following answers are the correct ones? I am not sure to have understood the effect of removing the first two connected layers. Thank you!


In reply to Sara Jun

Re: [Spoiler] Mock 2022 1.5.4 Neural Networks

by Eugène Léo Robert Bergeron -
Hi Sara,

By removing neurons in the layers, you decrease the number of parameters of the MLP in the CNN. Therefore you decrease the model complexity, which is a solution for overfitting.

To get the intuition: by removing neurons in the MLP, you force the MLP to think less deeply, so it can't capture specific aspects of the training set and therefore is forced to generalize its thinking (which is good for us).


The second correct answer is weight decay, a regularization method. I personally see weight decay as a way to force the model to not rely on a few neurons to capture specific features of the training set and output a result only depending on those features, which is overfitting. It is forced to work with all its parameters and see connections between much more features than a few and therefore generalize its thinking.
Ex: in your training set of cats and dogs, all images of cats have a dark pixel at coordinates (4, 4), your lazy smart model will just search for this black pixel on any given image that you will give to it, and if there is this black pixel, it will output cat, but if not, it will output dog. It can do so by having a super strong neuron with an immense value which tell if there is this black pixel or not. With weight decay, firstly this super strong neuron will be less strong, which leavew more room for other features to be extracted, and secondly if your model have better results working on more features than just this black pixel, it will generalize more its thinking, which is good for us :)

By increasing the size of the training set, you train your model on more samples. Remember overfitting for polynomial curve fitting (slide 23 of KNN): if you add more points, the curve your model predicts is no longer fitting correctly and your model is forced to generalize a bit more, until it find a good curve.

Hope it helps !
Eugène