Solved – Degrees of freedom in regression

degrees of freedomregression

I went through this thread on degrees of freedom: How to understand degrees of freedom?, and the great answers in it, but then I read the following on Wikipedia on the article about regression:

Statistical assumptions

When the number of measurements, N, is larger than the number of
unknown parameters, k, and the measurement errors εi are normally
distributed then the excess of information contained in (N − k)
measurements is used to make statistical predictions about the unknown
parameters. This excess of information is referred to as the degrees
of freedom of the regression.

Given this definition, if $N$ increases, the degrees of freedom increase as well, but intuitively that would make the problem more constrained (we have more information per parameter). Why is N-k then called degrees of freedom, and it isn't the other way around e.g. (k-N)?

Best Answer

You may be confused between degrees of freedom attributed to different things.

We would not use negative numbers to count; but there are two sides to the ledger.

In common situations, the data degrees of freedom will be $N$, say.

The model degrees of freedom -- the degrees of freedom the model has to fit the data -- is $k$, and the residual degrees of freedom is what's left over: $N-k$. That $k$ may often be partitioned into various components of the model.

Any of them might be called "the" degrees of freedom depending on what, exactly, is being discussed.

Indeed, we use 'degrees of freedom' more broadly still, whence the appearance of noninteger degrees of freedom for some kinds of models, and references to things like "researcher degrees of freedom".

Related Question