[Spoiler] Mock Exam 2020 - Parameter in the regularization term

[Spoiler] Mock Exam 2020 - Parameter in the regularization term

par Tsz Kin Brian Tsang,
Number of replies: 1

Hello I want to know the parameter in the regularization term for the linear regression and logistic regression.


And the lambda in the regularization term, 1/3:


I compared the solution in other version of mock paper:


The lambda is 1/6 in the solution:


How to determine the value of lambda in the regularization term?

Moreover, when we do the polynomial feature expansion, how to determine the number of dimension after the expansion ?

For example, if x in R^2, x = [xi1, xi2] , the dimensions after expansion becomes 6, which is corresponding to [1, xi1, xi2, xi1^2, xi2^2, xi1xi2].

Thank you for your time. 

In reply to Tsz Kin Brian Tsang

Re: [Spoiler] Mock Exam 2020 - Parameter in the regularization term

par Nicolas Talabot,
Usually, this regularization term will appear in the total loss function (the overall objective we want to minimize). For the logistic regression: the cross-entropy E(w) should be minimized, and if we want to also add such L2 regularization, the problem becomes:

w* = argmin_w E(w) + lambda * L_reg(w)

and lambda is a constant that serves to balance the two terms (similar to the "C" that appears in the SVM formulation). Usually, the value of lambda will have to be found by trial-and-error, meaning by testing values and selecting the one that gives the best performance (on a validation set).

For the example of the exams above, I believe they simply set it to 1/D, where D is the dimension of w. (if you look at the sum, it has 3 terms in the first one and 6 in the other)
In that case, I think any value would be considered correct as it is mostly the formula for the L2 loss that is important (well, based on the info I see here).


The number of terms after the polynomial expansion will depend on the initial number of features, the maximum degree of the polynomial, and which terms are kept.
There is probably some formula to deduce the maximum possible number of terms based on that but I don't know it, nor are you expected to for this course.