Lab 2a, Cloud Seeding Data, question about summary of clouds.lm

Re: Lab 2a, Cloud Seeding Data, question about summary of clouds.lm

by Darlene Goldstein -
Number of replies: 0
hello - for factor variables (cloud seeding yes/no), the number of terms in the model is number of factor levels (here 2) minus 1 (so 2-1 = 1). This is because the design matrix X for the model will become rank deficient (ie, linear dependent columns) if you include all levels. Why is this important? Because to get the least squares estimator we need to compute
(Xtranspose * X) inverse * Xtranspose* y - and if X is rank deficient then you cannot invert Xtranspose * X...... so you can't get the least squares estimator.

It is similar for the interaction terms - they do not have to be chosen in the way that R chooses them by default, but the important aspect is that the number of estimable interactions is only equal to the number of interaction degrees of freedom. ie, there are constraints on the interactions, so they cannot all be estimated freely.

Does this make sense??

Best regards,
Darlene