Using only training data to infer parameters

Using only training data to infer parameters

by Nicolas Alain Alexandre Marie Thierry D'Argenlieu -
Number of replies: 1

Hi,

I was just wondering in Homerwork 2, for the answer to the question :

Question "what are the right mu x and sigma x to use ? Why ?" from Homework 2

It is answered that one can only infer paramaters from training data. In the discussion above, we are told to normalize the data by computing its mean and variance to adjust the data samples. Does the answer concerning training data only hold in this case or is it always from the training data that such parameters are computed ? Can't we use the mean and variance over the whole dataset or does is introduce a bias in the training process ?

Thanks 

Nicolas

In reply to Nicolas Alain Alexandre Marie Thierry D'Argenlieu

Re: Using only training data to infer parameters

by Firas Kanoun -

Hello,

It is preferable that you normalize data with respect to the parameters of training data only. You can of course normalize over the whole dataset but as you mentioned, that would introduce bias in the training process.

We want our test set to be as representative as possible of real data. We therefore don't want to use any information from the test set (including mean and variance) in the training process.

I hope that it answers your question.

Best,

Firas