Solved – t test of individual coefficient and wald test of euqality of two coefficients

hypothesis testingregression

I've run a regression $Y = a + \beta_1X_1 + \beta_2X_2 + \epsilon$, and I have an interest in testing which coefficient of $X_1$ and $X_2$ has a stronger impact on $Y$.

Here are the parameter estimates: $\hat{\beta_1} = 0.086$, $se(\hat{\beta_1}) = 0.019$, $\hat{\beta_2} = 0.068$, $se(\hat{\beta_2}) = 0.051$. $cov(\hat{\beta_1}, \hat{\beta_2}) < -0.001$.

Obviously, by t-test, $\hat{\beta_1}$ is significant (p. < 0.001) whereas $\hat{\beta_2}$ is insignificant (p. > 0.1). However, Wald test of equality of two coefficients fails to reject the null. So, we cannot conclude that $X_1$ has a stronger impact than $X_2$ on $Y$.

Questions is, how should we understand the difference in significance (by t-test) between $\hat{\beta_1}$ and $\hat{\beta_2}$, when Wald test fails to reject that $\hat{\beta_1}$ and $\hat{\beta_2}$ are equal?

Best Answer

In both cases you're testing completely different things. In other words, from a theoretical point of view, the hypothesis of interest is different.

When performing the t-test, your hypothesis of interest is $$ H_0: \beta_i = 0\: \text{ vs }\: H_1: \beta_i\ne0\:\:i\in\{1,2\} $$ whereas in the Wald test you're testing $$ H_0: \beta_1=\beta_2\: \text{ vs }\: H_1: \beta_1\ne\beta_2 $$ However, based on the way you formulated your question, it appears to me that your null hypothesis is something of the form: $$ H_0: \begin{pmatrix}1&0\\0&1\\1&-1\end{pmatrix}\begin{pmatrix}\beta_1\\\beta_2\end{pmatrix}=\begin{pmatrix}0\\0\\0\end{pmatrix} $$ and therefore the Wald Test you need to perform for the above's null hypothesis.

A final word regarding your example at hand, sometimes your inference may be flawed because you did not take into account certain features of your data when 1) estimating the parameters of interest, or 2) when using the correct critical values. For example, the normal approximation (or $\chi^2$) is not a great one if you have very few observations since it's an asymptotic approximation.

Hope this helps.

Related Solutions

Solved – Sign flipping when adding one more variable in regression and with much greater magnitude

Think of this example:

Collect a dataset based on the coins in peoples pockets, the y variable/response is the total value of the coins, the variable x1 is the total number of coins and x2 is the number of coins that are not quarters (or whatever the largest value of the common coins are for the local).

It is easy to see that the regression with either x1 or x2 would give a positive slope, but when incuding both in the model the slope on x2 would go negative since increasing the number of smaller coins without increasing the total number of coins would mean replacing large coins with smaller ones and reducing the overall value (y).

The same thing can happen any time you have correlalted x variables, the signs can easily be opposite between when a term is by itself and in the presence of others.

Solved – regression coefficient on sum of regressors

With a great deal of prodding (full credit to @whuber) I seem to have solved it:

We re-write $y = \beta_1X_1 + \beta_2X_2 + \epsilon$ as

$y = a(X_1 + X_2) + b(X_1 - X_2) + \epsilon$

$y = aX_1 + aX_2 + bX_1 - bX_2 + \epsilon$,

$y = (a + b)X_1 + (a - b)X_2 + \epsilon$, therefore

$\beta_1 = (a+b)$ and $\beta_2 = (a - b)$.

Solving for $a$ and $b$ we get:

$a = (\beta_1 + \beta_2)/2$ and $b = (\beta_1 - \beta_2)/2$

What I estimate is equivalent to

$y = \delta(X_1 + X_2) + u,$

$u = b(X_1 - X_2) + \epsilon$. We know that the $k$th coefficient of a multiple regression can be recovered by first partialling out the regressors other than $k$. Thus, if I let $z = X_1 + X_2$ and $q = X_1 - X_2$, the regression coefficient $b$ in:

$y = az + bq + \epsilon$ can be obtained with the steps

(1) $y = \delta z + u$,

(2) $q = \lambda z + e$

(3) $u = be + \epsilon$. Substituting expressions back in,

$y = \delta z + be + \epsilon$ from (1).

$y = \delta z + b(q - \lambda z) + \epsilon$ from (2). Therefore:

$y = (\delta - b \lambda)z + bq + \epsilon$. Now we know that $(\delta - b\lambda ) = a = (\beta_1 + \beta_2)/2$, so we solve for $\delta$ which yields:

$\delta = (\beta_1 + \beta_2)/2 + \lambda (\beta_1 - \beta_2)/2$

Here is some R code to check the solution, or to play around with:

rep.func <- function(N  = 100, b1 = 2, b2 = 10, s1 = 1, s2 = 5) {
  x1 <- rnorm(N, 0, s1)
  x2 <- rnorm(N, 0, s2)
  eps <- rnorm(N)
  y <- x1*b1 + x2*b2 + eps
  z <- x1 + x2
  q <- x1 - x2
  lambda <- lm(q ~ z - 1)$coefficients
  est.delta <- lm(y ~ z - 1)$coefficients
  est.betas <- lm(y ~ x1 + x2 - 1)$coefficients
  derived.delta <- sum(est.betas)/2 + lambda * (est.betas[1] - est.betas[2])/2
  c("est.delta" = est.delta, "derived.delta" = derived.delta)
}

rep.func()
#>     est.delta.z derived.delta.z 
#>         9.77236         9.77236

library(ggplot2)
res <- t(replicate(1000, rep.func()))

all.equal(res[,1], res[,2])
#> [1] TRUE

ggplot(as.data.frame(res), aes(est.delta.z, derived.delta.z)) + 
  geom_point() + geom_smooth(method='lm') + theme_minimal()

Update for case with P predictors, September 2020

Returning to this much later to generalize to case with P predictors at request. The underlying principle is the same: we are going to

re-parameterize the true data generating process so that it contains a term for the quantity we are interested in,
determine the value of the re-parameterized coefficients in terms of the original coefficients,
evaluate the effect of ommitted variable bias on the estimation of the re-parameterized coefficient

With $P$ predictors the true data generating process is:

$$y = \sum_{p=1}^{P} \beta_p x_p + \epsilon$$.

This true data generating process can be re-parameterized in a way that is mathematically equivalent as

$$ y = a (\sum_{p=1}^{P} x_p) + \sum_{p=2}^{P} b_p (x_1 - x_p) + \epsilon \\ y = (a + \sum_{p=2}^{P} b_p) x_1 + \sum_{p=2}^{P}(a - b_p)x_p + \epsilon $$

Now we can solve for these new parameters (a and b's) in terms of the old parameters ($\beta$'s).

$$ a + \sum_{p=2}^{P} b_p = \beta_1 \\ a - b_2 = \beta_2 \\ ... \\ a - b_P = \beta_P $$

therefore

$$ a = \frac{\sum_{p=1}^{P} \beta_P}{P} = \bar{\beta} \\ b_p = \bar{\beta} - \beta_p $$.

Now lets simplify our life a little by setting $$ z = \sum_{p=1}^{P}x_p \\ Q = x_1 - x_p \\ $$ so we can write $$ y = \bar{\beta}z + QA' + \epsilon $$ where $Q$ is an $N \times (P-1)$ matrix of our column differences and $A$ is a $(P-1)$-length row-vector collecting coefficients.

We now move to problem 3, namely that we are not actually estimating the model above we are estimating $$ y = \delta z + u $$ We just repeat the process above for when P = 2 (the application of Frisch-Waughl-Lovell) but try not to lose track of dimensions:

First we partial $z$ out of each column of $Q$ with some abuse of notation $$ q_p = \lambda_p z + e_p $$ so $\Lambda$ would be a vector of length $(P-1)$, and $e$ would be an $N$ by $(P-1)$ matrix.

Then we can re-write $u$ with $z$ removed: $$ u = eA' + \epsilon $$ Plug-in terms $$ y = \delta z + eA' + \epsilon \\ y = \delta z + QA' - (\Lambda A') z + \epsilon \\ y = (\delta - \Lambda A') z + QA' + \epsilon \\ \delta = \bar{\beta} + \Lambda A' $$

In words, this seems to say that the regressor in the short regression of $y$ on the sum of predictors, will be equal to the average of the regression coefficients in the original model, plus a 'bias' term equal in magnitude to the inner product of the ommitted coefficients in the reparameterization and the regression coefficients from a regression of the ommitted variables on the included variable.

Since its always risky to get your math from randos on the internet here's a simulation to support the derivation:

## Load mvtnorm because more interesting
# when predictors are correlated
library(mvtnorm)
library(ggplot2)

# create correlation matrix
set.seed(42)

rep_func <- function(N = 1000, P =10) {
  # how the predictors are correlated
  sig <- matrix(round(runif(P^2),2), nrow = P)
  diag(sig) <- 1:P
  sig[lower.tri(sig)] <- t(sig)[lower.tri(sig)]
  
  # Draw covariates
  X <- matrix(rmvnorm(N, mean = 1:P, sigma = sig), nrow = N)
  
  # draw true parameters
  betas <- rnorm(P)
  
  # draw true errors
  eps <- rnorm(N)
  
  # gen y
  y <- X %*% betas + eps
  
  # predictor sums
  Z <- rowSums(X)
  
  # create some helpers
  x1 <- X[,1]
  Q <- x1 - X[,2:P]
  
  # delta through regression of y on sum of predictors
  delta <- lm(y ~ Z - 1)$coef
  
  # true coefficients from reparameterization
  agg.reg <- lm(y ~ Z + Q - 1)$coef
  a <- agg.reg[1]
  A <- agg.reg[2:P]
  
  # lambdas
  lambda <- lm(Q ~ Z - 1)$coef
  
  # compare deltas
  delta_est <- sum(A * lambda) + mean(betas)
  
  c("lm_delta" = delta[[1]], "derived_delta" = delta_est)
}

rep_func()
#>      lm_delta derived_delta 
#>     0.3196803     0.3229088

res <- as.data.frame(t(replicate(1000, rep_func())))
#> Warning in rmvnorm(N, mean = 1:P, sigma = sig): sigma is numerically not
#> positive semidefinite

ggplot(res, aes(lm_delta - derived_delta)) + geom_density() +
  geom_vline(xintercept = 0) +
  hrbrthemes::theme_ipsum() +
  xlim(c(-1*sd(res$lm_delta), 1*sd(res$lm_delta))) +
  ggtitle("LM estimated delta - derived delta", 
          subtitle = "X-axis +/- 1 standard error of estimate")

^{Created on 2020-09-20 by the reprex package (v0.3.0)}

Best Answer

Related Solutions

Solved – Sign flipping when adding one more variable in regression and with much greater magnitude

Solved – regression coefficient on sum of regressors

Update for case with P predictors, September 2020

Related Question