Solved – Lack of fit and Pure error

referencesregressionrepeated measuresresidualsself-study

I don't understand the concepts of lack of fit error and pure error.

What I know is:

$\bullet$ Lack of fit error: Error that occurs when the analysis omits one or more important terms or factors from the process model.

$\bullet$ Pure error: I occurs for repeated values of dependent variable, Y for a fixed value of independent variable, X.

Can you please explain these two terms to me?

If there is no repeated observations, does the pure error occur? If not, can't it possible to occur lack of fit error? Do they need to occur simultaneously ?
Can't lack of fit error solely contribute to residual ? that is ,
$$\text{Residual Error=Lack of fit error + Pure Error}$$

If Pure Error=0, then can't be it
$$\text{Residual Error=Lack of fit error}$$

Since we do the test
$$F=\frac{\text{Mean Square due to Lack of fit}}{\text{Mean Square due to Pure error}}$$

It is good to have larger value of Pure error. Doesn't it imply our observation's are so heterogeneous. Isn't it good to have homogeneous units?

Best Answer

SSE, SSPE, SSLF

  • SSE (sum-squares due to error) is defined as $\sum_{i=1}^i(y_i-\hat{y_i})^2$
  • SSE can be decomposed into 2 components: SSPE (sum-squares due to pure error) and SSLF (sum-squares due to lack of fit): $SSE = SSPE + SSLF$
  • SSPE measures the inherent variability of $y$ which cannot be explained by any model.
  • SSLF represents variability of $y$ that cannot be explained by the given model. This value may be reduced if a better model is used.

Lack-of-fit (LoF) F-test

We can perform a LoF test only if there are repeated measurements. We test

$H_0$: There is no lack of fit in the model, vs

$H_1$: There is a lack of fit.

$$ F = \frac{MSLF}{MSPE}= \frac{SSLF/df_1}{SSPE/{df_2}}$$

where $df_2$ is the degrees of freedom for SSPE ($\sum_{j=1}^m(n_j-1)$), $df_1$ is the degrees of freedom for SSLF.

(Without repeated measurements, $SSPE = 0$ and hence you cannot conduct the LoF test.

Interpretation

  • If the LoF test is significant ($F_{observed} > F_\alpha$), we should look for an alternate model.
  • If the LoF test is insignificant ($F_{observed} < F_\alpha$) , it is not necessary to find a more complicated model.

side note here: these are really just the gist of what I know from my regression analysis course, which I just took. for more awesomeness, I'd suggest consulting the two books below.

Really awesome textbooks

Related Question