Solved – Standard error for the sum of regression coefficients when the covariance is negative

covariancelinear modelregressionregression coefficientsstandard error

I have a question about appropriately calculation the standard error for the sum of two coefficients in a linear regression model. My question is similar to this and this, but I can't seem to solve the problem from the answers presented there.

I have a model of the following form:

$Y = \alpha + \beta_1X1 + \beta_2X2 + \beta_3X1X2$

I would like to be able to calculate the marginal effect of $X2$ on $Y$ for $X1 = 0$ and $X1=1$. I think I am right in saying that the point estimates for these marginal effects is simply $\beta_2$ for the first case and $\beta_2 + \beta_3$ for the second case.

My question regards the appropriate calculation of the standard error. I think that this is the correct formula for the standard error of the $\beta_2 + \beta_3$ point estimate.

$SE_{b_{2+3}} = \sqrt{SE_2^2 + SE_3^2+2Cov(\beta_2,\beta_3)}$

However, the problem arises from the fact that the model that I am estimating produces a covariance matrix that looks like this:

              Constant        beta_1        beta_2        beta_3
Constant  3.938580e-06 -6.259416e-06 -1.397691e-06  2.242824e-06
beta_1   -6.259416e-06  1.187334e-04  2.222736e-06 -4.738965e-05
beta_2   -1.397691e-06  2.222736e-06  5.457572e-07 -8.701802e-07
beta_3    2.242824e-06 -4.738965e-05 -8.701802e-07  2.004982e-05

Plugging the relevant values into the formula above results in a negative number below the square root sign, which clearly can't be right!

1.409763e-08 + 4.019951e-10 + 2*(-8.701802e-07) = -1.725861e-06

In more general terms, to me this seems like a more general problem where the covariance of two estimated parameters is negative and larger (in absolute terms) than the sum of the variances of those parameters. On the other hand, I may just be making a mistake somewhere. If anyone has any suggestions as to where I might be going wrong, it would be most appreciated.

Best Answer

To elaborate on (and in fact, make more precise my part of) the discussion in the comments a bit:

Variance-covariance matrices are positive semi-definite, as discussed for example in @DilipSarwate's answer here:

The variance of a weighted sum $\sum_i a_i X_i$ of random variables must be nonnegative for all choices of real numbers $a_i$. Since the variance can be expressed as $$\operatorname{var}\left(\sum_i a_i X_i\right) = \sum_i \sum_j a_ia_j \operatorname{cov}(X_i,X_j) = \sum_i \sum_j a_ia_j \Sigma_{i,j},$$ we have that the covariance matrix $\Sigma = [\Sigma_{i,j}]$ must be positive semidefinite (which is sometimes called nonnegative definite).

(As the general notation suggests, this issue therefore also has nothing to do with coefficient estimates, but applies for all random variables.)

If you specialize this to your problem $a_1=a_2=1$, you obtain the well-known result that $$ \operatorname{var}\left(X_1+X_2\right) = \operatorname{var}\left(X_1\right) +\operatorname{var}\left(X_2\right) +2\operatorname{cov}(X_i,X_j)\geq0 $$ Thus, the smallest possible covariance is $$ \operatorname{cov}(X_i,X_j)=-\frac{\operatorname{var}\left(X_1\right) +\operatorname{var}\left(X_2\right) }{2} $$

Related Question