Solved – How to interpret p value of regression coefficient which is nearly 0

hypothesis testingp-valueregression coefficientsstatistical significance

When regression coefficient is nearly 0 (in fact in the real model it's exactly 0), what's the meaning of p value (<0.05) of the coefficient?

For example, I did a multiple variable regression with simulated data in R with lm().

Generate simulation data with the equation
$$
y=2x_1^2+3x_2^2+3x_1+5
$$

The terms $x_1x_2$ and $x_2$ coefficients are zero. Using the data to do regression.

xmesh=mesh(seq(-4,4,0.1),seq(-4,4,0.1))
x1=as.vector(xmesh$x)
x2=as.vector(xmesh$y)
y=2*x1^2+3*x2^2+3*x1+5
model=lm(y~x1+x2+I(x1^2)+I(x2^2)+I(x1*x2)) 
summary(model)

The result is :

Call:
lm(formula = y ~ x1 + x2 + I(x1^2) + I(x2^2) + I(x1 * x2))

Residuals:
       Min         1Q     Median         3Q        Max 
-8.871e-12 -4.500e-15 -7.000e-16  5.700e-15  4.194e-12 

Coefficients:
              Estimate Std. Error    t value Pr(>|t|)    
(Intercept)  5.000e+00  3.301e-15  1.515e+15  < 2e-16 ***
x1           3.000e+00  7.545e-16  3.976e+15  < 2e-16 ***
x2          -3.348e-15  7.545e-16 -4.438e+00 9.22e-06 ***
I(x1^2)      2.000e+00  3.609e-16  5.542e+15  < 2e-16 ***
I(x2^2)      3.000e+00  3.609e-16  8.314e+15  < 2e-16 ***
I(x1 * x2)  -9.377e-16  3.227e-16 -2.906e+00  0.00367 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 1.429e-13 on 6555 degrees of freedom
Multiple R-squared:     1,  Adjusted R-squared:     1 
F-statistic: 2.313e+31 on 5 and 6555 DF,  p-value: < 2.2e-16

We can see that the coefficients of term $x_2$ and $x_1x_2$ are nearly 0, and p-value<0.01. I think that lm() did the significance test of coefficient based on t-test with NULL hypothesis $\beta=0$. So p-value<0.05 should mean that the coefficient is significantly different to 0. However the coefficient should be 0 in my model. I am confused. How to interpret these two coefficients' significance?

Add a new test $y=2x_1^2+3x_1+0.001x_2+5$

> y2=2*x1^2+3*(x1)+5+0.001*x2
> model3=lm(y2~x1+x2+I(x1^2)+I(x2^2)+I(x1*x2)) 
> summary(model3)

Call:
lm(formula = y2 ~ x1 + x2 + I(x1^2) + I(x2^2) + I(x1 * x2))

Residuals:
       Min         1Q     Median         3Q        Max 
-9.237e-12 -1.700e-15 -1.000e-16  2.200e-15  2.757e-12 

Coefficients:
              Estimate Std. Error    t value Pr(>|t|)    
(Intercept)  5.000e+00  2.840e-15  1.761e+15   <2e-16 ***
x1           3.000e+00  6.492e-16  4.621e+15   <2e-16 ***
x2           1.000e-03  6.492e-16  1.540e+12   <2e-16 ***
I(x1^2)      2.000e+00  3.105e-16  6.441e+15   <2e-16 ***
I(x2^2)     -2.722e-16  3.105e-16 -8.770e-01    0.381    
I(x1 * x2)  -3.226e-16  2.776e-16 -1.162e+00    0.245    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 1.229e-13 on 6555 degrees of freedom
Multiple R-squared:     1,  Adjusted R-squared:     1 
F-statistic: 1.257e+31 on 5 and 6555 DF,  p-value: < 2.2e-16

You can see that in the new test, the coefficients and std Error of term $x_1x_2$ and $x_2^2$ are essentially zero. Their p-value are large enough to accept the null hypothesis that $\beta=0$, it's a good result.

How to interpret the p-value of essentially zero coefficients in the two tests?

Best Answer

This has more to do with how computers work than with p-values. You have to remember that computers can't represent real numbers exactly. We are dealing with floating point numbers. So some algorithms will never get exactly zero, even if analytically the result should be zero. For example (0.3-0.2) - (0.2-0.1) will not give you zero.

You can see that the estimates are essentially zero:

all.equal(-3.348e-15, 0)
TRUE
all.equal(-9.377e-16, 0)
TRUE

The same goes for your standard errors: they are zero.

Related Solutions

P-Value – Is the Exact Value of a P-Value Meaningless in Statistical Significance?

The type 1 / false rejection error rate $\alpha=.05$ isn't completely arbitrary, but yes, it is close. It's somewhat preferable to $\alpha=.051$ because it's less cognitively complex (people like round numbers and multiples of five). It's a decent compromise between skepticism and practicality, though maybe a little outdated – modern methods and research resources may make higher standards (i.e., lower $p$ values) preferable, if standards there must be ^{(Johnson, 2013)}.

IMO, the greater problem than the choice of threshold is the often unexamined choice to use a threshold where it is not necessary or helpful. In situations where a practical choice has to be made, I can see the value, but much basic research does not necessitate the decision to dismiss one's evidence and give up on the prospect of rejecting the null just because a given sample's evidence against it falls short of almost any reasonable threshold. Yet much of this research's authors feel obligated to do so by convention, and resist it uncomfortably, inventing terms like "marginal" significance to beg for attention when they can feel it slipping away because their audiences often don't care about $p$s $\ge.05$. If you look around at other questions here on $p$ value interpretation, you'll see plenty of dissension about the interpretation of $p$ values by binary fail to/reject decisions regarding the null.
Completely different – no. Meaningfully different – maybe. One reason to show a ridiculously small $p$ value is to imply information about effect size. Of course, just reporting effect size would be much better for several technical reasons, but authors often fail to consider this alternative, and audiences may be less familiar with it as well, unfortunately. In a null-hypothetical world where no one knows how to report effect sizes, one may be right most often in guessing that a smaller $p$ means a larger effect. To whatever extent this null-hypothetical world is closer to reality than the opposite, maybe there's some value in reporting exact $p$s for this reason. Please understand that this point is pure devil's advocacy...

Another use for exact $p$s that I've learned by engaging in a very similar debate here is as indices of likelihood functions. See Michael Lew's comments on and article ^{(Lew, 2013)} linked in my answer to "Accommodating entrenched views of p-values".
I don't think the Bonferroni correction is the same kind of arbitrary really. It corrects the threshold that I think we agree is at least close-to-completely arbitrary, so it doesn't lose any of that fundamental arbitrariness, but I don't think it adds anything arbitrary to the equation. The correction is defined in a logical, pragmatic way, and minor variations toward larger or smaller corrections would seem to require rather sophisticated arguments to justify them as more than arbitrary, whereas I think it would be easier to argue for an adjustment of $\alpha$ without having to overcome any deeply appealing yet simple logic in it.

If anything, I think $p$ values should be more open to interpretation! I.e., whether the null is really more useful than the alternative ought to depend on more than just the evidence against it, including the cost of obtaining more information and the added incremental value of more precise knowledge thusly gained. This is essentially the Fisherian no-threshold idea that, AFAIK, is how it all began. See "Regarding p-values, why 1% and 5%? Why not 6% or 10%?"

If fail to/reject crises aren't forced upon the null hypothesis from the outset, then the more continuous understanding of statistical significance certainly does admit the possibility of continuously increasing significance. In the dichotomized approach to statistical significance (I think this is sometimes referred to as the Neyman-Pearson framework; ^{cf. Dienes, 2007),} no, any significant result is as significant as the next – no more, no less. This question may help explain that principle: "Why are p-values uniformly distributed under the null hypothesis?" As for how many zeroes are meaningful and worth reporting, I recommend Glen_b's answer to this question: "How should tiny $p$-values be reported? (and why does R put a minimum on 2.22e-16?)" – it's much better than the answers to the version of that question you linked on Stack Overflow!

^{References

- Johnson, V. E. (2013). Revised standards for statistical evidence. Proceedings of the National Academy of Sciences, 110(48), 19313–19317. Retrieved from http://www.pnas.org/content/110/48/19313.full.pdf.

- Lew, M. J. (2013). To P or not to P: On the evidential nature of P-values and their place in scientific inference. arXiv:1311.0081 [stat.ME]. Retrieved from http://arxiv.org/abs/1311.0081.}

Solved – Different regression coefficients in R and Excel

The difference between coefficients is in the relation x versus y which is reversed in the one case.

Note that

in your R case the coefficient relates to 'suva'
and in your Excel case the coefficient relates to 'heather'.

see in the following code where R can get to both cases:

lm(suva ~ heather, data = as.data.frame(data))

Call:
lm(formula = suva ~ heather, data = as.data.frame(data))

Coefficients:
(Intercept)      heather  
      14.65       -13.60  

> lm(heather ~suva, data = as.data.frame(data))

Call:
lm(formula = heather ~ suva, data = as.data.frame(data))

Coefficients:
(Intercept)         suva  
    0.32524     -0.01276

rest of the code:

data <- c(
12.880545,   0.061156645, 0.15   , 0.525,   0,
7.098873327, 0.026878039, 0.2275,  0   ,0,
8.660688381, 0.04037841 , 0.425 ,  0.25 ,   0,
7.734546932, 0.021618446, 0.225 , 0.3875,  0,
16.70696048, 0.103626684, 0.15  ,  0.075,   0,
9.763315183, 0.013387158, 0.25  ,  0.075,   0,
12.91735434, 0.008076468, 0.22  ,  0.22 ,   0,
19.94153851, 0.150798057, 0.0375,  0.35 ,   0.225,
17.25115559, 0.052229596, 0.0625,  0.2625,  0.225,
15.38596941, 0.05429447 , 0.1125,  0.45 ,   0.025,
15.53714185, 0.05933884 , 0.1625,  0.525,   0.0625,
14.11551229, 0.064579437, 0.1875,  0.35 ,   0.1375,
14.88575569, 0.0189853  , 0.3375,  0.3, 0,
12.32229733, 0.043085602, 0.0875,  0.1375,  0,
17.23861185, 0.071705699, 0.15  ,  0.1375,  0,
11.50832463,     0.1125 , 0.0875,  0.075, 0,
14.4810484,  0.078476821, 0.0375,  0.125,   0.0625,
9.110262652, 0.077306938, 0.145 ,  0.35 ,   0.0125,
10.8571733,  0.02681341 , 0.0375,  0.525,   0,
9.589339421, 0.01892435 , 0.2275,  0  , 0,
7.260373588, 0.014538237, 0.425 ,  0.25 ,   0,
11.11099161, 0.022802578, 0.225 ,  0.3875 , 0,
10.81488848, 0.047587818, 0.15  ,  0.075  , 0,
8.224131957, 0.031126904, 0.25  ,  0.075  , 0,
8.818607863, 0.002855409, 0.22  ,  0.22   , 0,
11.53999863, 0.031465613, 0.0375,  0.35   , 0.225,
14.92784964, 0.069998663, 0.0625,  0.2625 , 0.225,
9.666480932, 0.02387741 , 0.1125,  0.45   , 0.025,
12.51000758, 0.016960259, 0.1625,  0.525  , 0.0625,
13.32611463, 0.033670382, 0.1875,  0.35   , 0.1375,
16.76535191, 0.029613698, 0.3375,  0.3 ,0,
11.24615281, 0.008440059, 0.0875,  0.1375,  0,
10.60564875, 0.003930792, 0.15  ,  0.1375,  0,
11.82909125, 0.036017582, 0.1125,  0.0875 , 0.075,
18.2337185,  0.143451512, 0.0375,  0.125  , 0.0625,
10.6226222,  0.020561242, 0.145 ,  0.35   , 0.0125
)
data <- matrix(data,36, byrow=1)
colnames(data) <- c("suva", "Std dev", "heather", "sedge",   "sphagnum")

Why then, is $R^2$ still the same?

There is a certain symmetry in the situation. The regression slope coefficient is (in simple linear regression) the correlation coefficient scaled by the variance of the $x$ and $y$ data.

$$\hat\beta_{y \sim x} = r_{xy} \frac{s_y}{s_x}$$

The regression model variance is then:

$$s_{mod} = \hat\beta_{y \sim x} s_x = r_{xy} s_y$$

and the ratio of model variance and variance of the data is:

$$R^2 = \left( \frac{s_{mod}}{s_y} \right)^2= r_{xy}^2$$

Best Answer

Related Solutions

P-Value – Is the Exact Value of a P-Value Meaningless in Statistical Significance?

Solved – Different regression coefficients in R and Excel

Related Question