Solved – Does the p-value in the incremental F-test determine how many trials I expect to get correct

anovaf-testmulticollinearitypolynomial

I've implemented an incremental F-test program that evaluates the fit of an unrestricted model $M_{UR}$ against the restricted model $M_R$ using the F statistic $\frac{SSE_{R} – SSE_{UR}}{SSE_{UR}}\frac{n-p-1}{j}$. In this instance, I'm interested in comparing a polynomial with order $p$, against another (the restricted model) with order $p-1$. It is worth noting that this necessarily makes $j = 1 $.

In order to validate this program, I create data using randomly generated polynomials, add gaussian noise to it, and see if the incremental F-test identifies the correct polynomial order (i.e. if the data is created from a $3^{rd}$ order polynomial, I would expect to get order $3$). In detail, the framework is as follows:

For i = 1 : $n_trials$:
    1. Randomly choose a polynomial order between 2 and 10
    2. Populate the coefficients of this polynomial with values between -5 and 5
    3. Evaluate this polynomial at abscissa values X = [0,0.01,0.02,...3.00]
    4. Add gaussian noise ~N(0,0.01) to each output of P(X)
    5. For p = 3:10 :
           a. Fit the tuples (X,P(X)) using polynomials of order p-1 and p
           b. Compare the results using the FTest, if it fails, exit. If it passes
              try increasing p = p+1
    6. Return the last polynomial order p-1 that passed the F-Test (at p-value 0.05)

Having done this for $n_{trials} = 3000$, I'm finding that the algorithm incorrectly identifies the order on average $200$ to $300$ times. However, if I've chosen a p-value of $0.05$, shouldn't I only expect to see errors $5\%$ of the time, that is $0.05\cdot3000 = 150$?

I also noticed that, if I change the range of X from $[0, 0.01, … ,3.00]$ to $[0, 0.1, … , 30.0]$, the F-test fails much more frequently, even though the number of data points is the same between the two experiments! Is this an artifact of the multicollinearity problem with polynomials?

Best Answer

There are a lot of issues here. The question specifically is about a difference in performance based on the range of values of $x$. This is easily explained. These tests compare amounts of variation of residuals compared to the fits. A polynomial of degree $d$ and coefficients bounded in absolute value by $k$ (equal to $5$ here) can have a range over the coordinates from $0$ to $u$ at least equal to $k\left(u + u^2 + \cdots + u^d\right)$ = $k u\left(u^{d}-1\right)/\left(u-1\right)$. When you change $u$ from $3$ to $30$ the change in potential ranges is huge. E.g., for $d=10$ the maximum in one case is on the order of $3^{11}$ and in the other case it is $10^{10}$ times as great. At this point, the noise (whose standard deviation is a tiny $0.01$) is inconsequential. Thus, even when the coefficient of $p^{10}$ is incredibly tiny, it will have an important (and therefore detectable) effect on the data.

Here is a plot of ten of your random polynomials (all of order $10$). Note the astronomical scale on the y-axis and observe how the highest term dominates the values.

Figure 1

You ought to consider a different universe of models. For instance, use polynomials of the form

$$p(x) = \sum_{i=0}^d \alpha_i \left(\frac{x}{u}\right)^i$$

defined on the range $[0,u]$. Here is a collection of them, once more with the coefficients varying randomly in $[-5,5]$ and all still of tenth order:

Figure 2

A rigorous test would add noise with standard deviation about the same as the variation in the polynomial values: around $10$ or so.

There are other concerns here: please read the replies by @gung and @jbowman. Consider, too, that you are using a restricted version of forward stepwise regression and do some research on the pros and cons of that approach for model building. Finally, note that in general, unless theory specifically indicates a polynomial model and suggests its order, fitting polynomials to data can be a deceptively poor approach: a tiny bit of overfitting can result in models that are grossly bad because higher degree polynomials can (and often do) vary so wildly in between the data values and will be horrible extrapolators.

Related Question