Solved – Sum of Squared Error Chi-Square distribution degree of freedom in Multilinear Regression

chi-squared-distributionmachine learningmultiple regressionregressionsums-of-squares

In this link it says that $Y$ variables has zero covariance (because covariance matrix has only diagonal terms) which implies they are independent.

Actually in linear regression $Y$ takes its expectation values from linear function of $X$ variables whereas takes its variance from error terms. Also $Y$ is normally distributed because of the fact that errors are normally distributed.

So, if $Y$ variables has zero covariance then errors must be zero covariance which means they are independent. Then in parallel with last answer sum of $(Y_i-\hat{Y}_i)^2$ divided by $\sigma^2$ which means sum of error squared divided by $\sigma^2$ must be chi-square with $n$ degrees of freedom. Why do we have $n-p-1$ dgf afterwards?

Could you help me overcome that contradiction?

Best Answer

The beginning of your question is a bit confusing, but the issue on why the degrees of freedom can be directly addressed. The proof is already in the question you point to, so I'll try a quick intuiton.

You see, $\frac{\sum_{i=1}(Y-\hat{Y_i})^2}{\sigma^2}$ is a function of $\hat{Y}_i$, a value that's obtained from a model with $p+1$ parameters (you have a constant plus a $p$ $x_i$ variables in $X$).

From a statistical intuition point of view, it's natural to expect you'd subtract these number of parameters from the total number of observations $n$, hence giving you $n-(p+1)$ degress of freedom.