Solved – How to interpret p value of regression coefficient which is nearly 0

hypothesis testingp-valueregression coefficientsstatistical significance

When regression coefficient is nearly 0 (in fact in the real model it's exactly 0), what's the meaning of p value (<0.05) of the coefficient?

For example, I did a multiple variable regression with simulated data in R with lm().

Generate simulation data with the equation
$$
y=2x_1^2+3x_2^2+3x_1+5
$$

The terms $x_1x_2$ and $x_2$ coefficients are zero. Using the data to do regression.

xmesh=mesh(seq(-4,4,0.1),seq(-4,4,0.1))
x1=as.vector(xmesh$x)
x2=as.vector(xmesh$y)
y=2*x1^2+3*x2^2+3*x1+5
model=lm(y~x1+x2+I(x1^2)+I(x2^2)+I(x1*x2)) 
summary(model)

The result is :

Call:
lm(formula = y ~ x1 + x2 + I(x1^2) + I(x2^2) + I(x1 * x2))

Residuals:
       Min         1Q     Median         3Q        Max 
-8.871e-12 -4.500e-15 -7.000e-16  5.700e-15  4.194e-12 

Coefficients:
              Estimate Std. Error    t value Pr(>|t|)    
(Intercept)  5.000e+00  3.301e-15  1.515e+15  < 2e-16 ***
x1           3.000e+00  7.545e-16  3.976e+15  < 2e-16 ***
x2          -3.348e-15  7.545e-16 -4.438e+00 9.22e-06 ***
I(x1^2)      2.000e+00  3.609e-16  5.542e+15  < 2e-16 ***
I(x2^2)      3.000e+00  3.609e-16  8.314e+15  < 2e-16 ***
I(x1 * x2)  -9.377e-16  3.227e-16 -2.906e+00  0.00367 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 1.429e-13 on 6555 degrees of freedom
Multiple R-squared:     1,  Adjusted R-squared:     1 
F-statistic: 2.313e+31 on 5 and 6555 DF,  p-value: < 2.2e-16 

We can see that the coefficients of term $x_2$ and $x_1x_2$ are nearly 0, and p-value<0.01. I think that lm() did the significance test of coefficient based on t-test with NULL hypothesis $\beta=0$. So p-value<0.05 should mean that the coefficient is significantly different to 0. However the coefficient should be 0 in my model. I am confused. How to interpret these two coefficients' significance?

Add a new test $y=2x_1^2+3x_1+0.001x_2+5$

> y2=2*x1^2+3*(x1)+5+0.001*x2
> model3=lm(y2~x1+x2+I(x1^2)+I(x2^2)+I(x1*x2)) 
> summary(model3)

Call:
lm(formula = y2 ~ x1 + x2 + I(x1^2) + I(x2^2) + I(x1 * x2))

Residuals:
       Min         1Q     Median         3Q        Max 
-9.237e-12 -1.700e-15 -1.000e-16  2.200e-15  2.757e-12 

Coefficients:
              Estimate Std. Error    t value Pr(>|t|)    
(Intercept)  5.000e+00  2.840e-15  1.761e+15   <2e-16 ***
x1           3.000e+00  6.492e-16  4.621e+15   <2e-16 ***
x2           1.000e-03  6.492e-16  1.540e+12   <2e-16 ***
I(x1^2)      2.000e+00  3.105e-16  6.441e+15   <2e-16 ***
I(x2^2)     -2.722e-16  3.105e-16 -8.770e-01    0.381    
I(x1 * x2)  -3.226e-16  2.776e-16 -1.162e+00    0.245    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 1.229e-13 on 6555 degrees of freedom
Multiple R-squared:     1,  Adjusted R-squared:     1 
F-statistic: 1.257e+31 on 5 and 6555 DF,  p-value: < 2.2e-16 

You can see that in the new test, the coefficients and std Error of term $x_1x_2$ and $x_2^2$ are essentially zero. Their p-value are large enough to accept the null hypothesis that $\beta=0$, it's a good result.

How to interpret the p-value of essentially zero coefficients in the two tests?

Best Answer

This has more to do with how computers work than with p-values. You have to remember that computers can't represent real numbers exactly. We are dealing with floating point numbers. So some algorithms will never get exactly zero, even if analytically the result should be zero. For example (0.3-0.2) - (0.2-0.1) will not give you zero.

You can see that the estimates are essentially zero:

all.equal(-3.348e-15, 0)
TRUE
all.equal(-9.377e-16, 0)
TRUE

The same goes for your standard errors: they are zero.

Related Question