Solved – the null hypothesis for the individual p-values in multiple regression

p-valueregression

I have a linear regression model for a dependent variable $Y$ based on two independent variables, $X1$ and $X2$, so I have a general form of a regression equation

$Y = A + B_1 \cdot X_1 + B_2 \cdot X_2 + \epsilon$,

where $A$ is the intercept, $\epsilon$ is the error term, and $B_1$ and $B_2$ are the respective coefficients of $X_1$ and $X_2$. I perform a multiple regression with software (statsmodel in Python) and I get coefficients for the model: $A = a, B_1 = b_1, B_2 = b_2$. The model also gives me $p$ values for each coefficient: $p_a$, $p_1$, and $p_2$. My question is: What is the null hypothesis for those individual $p$ values? For example, to obtain $p_1$ I know that the null hypothesis entails a 0 coefficient for $B_1$, but what about the other variables? In other words, If the null hypothesis is $Y = A + 0 \cdot X_1 + B_2 \cdot X_2$, what are the values of $A$ and $B_2$ for the null hypothesis from which the $p$-value for $B_1$ is derived?

Best Answer

The null hypothesis is $$ H_0: B1 = 0 \: \text{and} \: B2 \in \mathbb{R} \: \text{and} \: A \in \mathbb{R}, $$ which basically means that the null hypothesis does not restrict B2 and A. The alternative hypothesis is $$ H_1: B1 \neq 0 \: \text{and} \: B2 \in \mathbb{R} \: \text{and} \: A \in \mathbb{R}. $$ In a way, the null hypothesis in the multiple regression model is a composite hypothesis. It is "fortunate" that we can construct a pivotal test statistic that does not depend on the true value of B2 and A, so that we do not suffer a penalty from testing a composite null hypothesis.

In other words, there are a lot of different distributions of $(Y, X1, X2)$ that are compatible with the null hypothesis $H_0$. However, all of these distributions lead to the same behavior of the the test statistic that is used to test $H_0$.

In my answer, I have not addressed the distribution of $\epsilon$ and implicitly assumed that it is an independent centered normal random variable. If we only assume something like $$ E[\epsilon \mid X1, X2] = 0 $$ then a similar conclusion holds asymptotically (under regularity assumptions).

Best Answer

Related Solutions

Solved – Confidence interval for difference of means in regression

Solved – the relationship between correlation coefficients and regression coefficients in multiple regression

Related Question