Solved – $F$-test for hypothesis $\beta_1+\beta_2=2\beta_3$ in a regression

f-testhypothesis testingleast squareslinearregression

In a regression $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \epsilon$, how do I use an $F$-test to test the hypothesis $\beta_1+\beta_2=2\beta_3$? The standard $F$-test would test a hypothesis $H_0: \beta_1 = \beta_2 = \beta_3 = 0$

Best Answer

@Glen_b already provided a link to the discussion containing the theoretical aspects.

Here is a quick pratical example of how one would do it in R. Please also have a look at these documents which contain the theory as well as examples: Simultaneous Inference in General Parametric Models and Additional multcomp Examples.

We will use the mtcars dataset and build a linear regression model containing three variables: cyl (Number of cylinders), disp (Displacement) and hp (Horsepower) to predict the variable mpg (Miles/Gallon).

Then, we test the following hypothesis: $\beta_{\mathrm{cyl}}+\beta_{\mathrm{disp}}-2\cdot\beta_{\mathrm{hp}} = 0$.

Using the multcomp package, there are two ways of specifying the hypothesis:

As a matrix
by symbolic description

I included both version in the code below. In our example, the matrix would simply be a row vector: $\mathbf{K} = (0, 1, 1, -2)$. The zero at the beginning is necessary because our regression model includes an intercept.

By symbolic description means that you can simply state your hypothesis as a character string. In this case: "cyl + disp - 2*hp = 0".

In this example, the estimate of our hypothesis is $-1.2169$ with little evidence that it differs from $0$. The function confint is used to generate a confidence interval for the estimate: $(-2.86; 0.43)$.

#---------------------------------------------------------------------------------------
# Load "multcomp" package
#---------------------------------------------------------------------------------------

require(multcomp)

#---------------------------------------------------------------------------------------
# Load "mtcars" dataset
#---------------------------------------------------------------------------------------

data(mtcars)

#---------------------------------------------------------------------------------------
# Build linear regression model with three variables
#---------------------------------------------------------------------------------------

lm.mod <- lm(mpg~cyl+disp+hp, data = mtcars)

summary(lm.mod)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 34.18492    2.59078  13.195 1.54e-13 ***
cyl         -1.22742    0.79728  -1.540   0.1349    
disp        -0.01884    0.01040  -1.811   0.0809 .  
hp          -0.01468    0.01465  -1.002   0.3250 

#---------------------------------------------------------------------------------------
# Define the general hypothesis
#---------------------------------------------------------------------------------------

K <- c("cyl + disp - 2*hp = 0") # As a formula

# K <- rbind(c(0, 1, 1, -2)) # As a contrast matrix
# rownames(K) <- c("cyl + disp - 2hp")
# colnames(K) <- names(coef(lm.mod))

#---------------------------------------------------------------------------------------
# Evaluate the general hypothesis and calculate confidence intervals
#---------------------------------------------------------------------------------------

glht.mod <- glht(lm.mod, linfct = K)

summary(glht.mod)

     Simultaneous Tests for General Linear Hypotheses

Fit: lm(formula = mpg ~ cyl + disp + hp, data = mtcars)

Linear Hypotheses:
                         Estimate Std. Error t value Pr(>|t|)
cyl + disp - 2 * hp == 0  -1.2169     0.8036  -1.514    0.141
(Adjusted p values reported -- single-step method)

confint(glht.mod)

     Simultaneous Confidence Intervals

Fit: lm(formula = mpg ~ cyl + disp + hp, data = mtcars)

Quantile = 2.0484
95% family-wise confidence level

Linear Hypotheses:
                         Estimate lwr     upr    
cyl + disp - 2 * hp == 0 -1.2169  -2.8631  0.4293

Related Solutions

Solved – How to test whether $\beta_1= \beta_3 = 0.5$ using R (without using offset function)

First Create the model

data = fread(paste0("http://www1.aucegypt.edu/faculty/hadi/RABE5/Data5/", "P060.txt"))
model <- lm(data = data, Y ~ X1 + X3)

Then you can use the following code:

library(car)
linearHypothesis(model, c("X1=X3", "X1=0.5"))

You will get the same output with less code and hassle.

Regression – Residualizing Dependent Variable and Two-Step Linear Regression Explained

It general these regressions are not the same, but since you have simulated independence it works out.

Consider the full regression \begin{align*} Y = \alpha + X_1\beta_1 + X_2\beta_2 + X_3\beta_3 +\epsilon \end{align*}

Let me define a $n\times 3$ matrix $Z = [1,X_1,X_2]$, a column of ones then the values of $X_1$ and $X_3$. Let $M_z$ be the residual making projection matrix. In other words this matrix when applied to a variable is the same as regression that variable on a column of ones, $X_1$ and $X_2$.

\begin{equation} \hat{\beta_3} = \frac{X_3^TM_zY}{X_3^TM_zX_3} \end{equation}

The above formula follows from the application of the FWL theorem. You can derive the result equally from minimizing the sum of squared residuals, but the matrix notation and FWL theorem make things much cleaner and in this case give us insight into the question you asked.

Recall that in simple regression, the general formula for the OLS estimate is $\hat{\beta} = \frac{X^TY}{X^TX}$. You can show for yourself using this formula that the $\beta_3$ that we derived above is the same as the following simple regression: \begin{align*} M_zY = M_zX_3\beta_3 + \epsilon \end{align*}

(Note: $(M_z)^TM_z = M_z$ by properties of orthogonal projection matrices. Second note, I wrote $\epsilon$ again because these residuals are numerically the same by FWL. In other words this is the exact same regression).

So to answer the question, it would be same as residualizes Y by regression in a constant, $X_1$ and $X_2$ and then regressing these residuals on the residuals of $X_3$ regressed on a constant, $X_1$ and $X_2$.

In you simulation you have all your $Xs$ as independent standard normal variables. In expectation when you regress $X_3$ on the others, the result is the same so the residualizing of $X_3$ doesn't matter. In the real data, my guess is that $X_3$ is not mean 0 and has some real relationship with the other two variables so it will not work.

Best Answer

Related Solutions

Solved – How to test whether $\beta_1= \beta_3 = 0.5$ using R (without using offset function)

Regression – Residualizing Dependent Variable and Two-Step Linear Regression Explained

Related Question