Solved – Linear regression without intercept – sampling variance of coefficient

distributionsregressionregression coefficientssampling

I am comparing linear regression with and without intercept for the general sampling case.
For this, I have $n$ samples of two correlated random variables $X \sim N\left(0,\sigma_X^2\right)$ and $Y \sim N\left(0, \sigma_Y^2\right)$ with correlation $\rho$.

For the random samples, I calculate the linear regression models with and without intercept
(1) $y_i=\alpha_0+\alpha_1x_i+\epsilon_1$ and
(2) $y_i=\beta_1x_i+\epsilon_2$

Using numerical experiments, I have found that $E[\hat\alpha_1] = E[\hat\beta_1]$, which seems logical to me. However, I have also found thtat $\text{Var}(\hat\alpha_1) \neq \text{Var}(\hat\beta_1)$, which I am currently trying to understand.

In another question of mine, I have found that for the general sampling case $\text{Var}(\hat \alpha_1) = \frac{\sigma_Y^2}{\sigma_X^2} \frac{1-\rho^2}{N-3}$ for the model with intercept and am trying to find $\text{Var}(\hat\beta_1)$.

Overall, I am therefore trying to find $\text{Var}(\hat\beta_1)=\text{Var}\left(\frac{\sum x_iy_i}{\sum x_i^2}\right)$.

The denominator is clearly gamma distributed. However, the distribution of the numerator as a sum of products of normal distributed random variables is tough, not to mention the ratio.

Calculating $\text{Var}(\hat\beta_1)=E[\hat\beta_1^2] – E[\hat\beta_1]^2$ isn't much easier, I think.

After spending hours in the local university library and searching research papers, I am turning to CrossValidated for help (again).

Does somebody know a way to calculate the variance in question?

Best Answer

In your problem, assuming joint-normality of the variables, you can write the joint distribution of a single data point $(X_i, Y_i)$ as:

$$\begin{bmatrix} X_i \\ Y_i \end{bmatrix} \text{ ~ N} \left( \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} \sigma_X^2 & \rho \sigma_X \sigma_y \\ \rho \sigma_X \sigma_y & \sigma_Y^2 \end{bmatrix} \right).$$

With a bit of algebra, the value $Y_i$ can be shown to be equivalent to:

$$Y_i = \rho \frac{\sigma_Y}{\sigma_X} \cdot X_i + \sqrt{1 - \rho^2} \sigma_Y \cdot \varepsilon_i,$$

where $\varepsilon_i$ is an independent standard normal error term. Hence, the true regression model is:

$$\begin{matrix} Y_i = \beta_0 + \beta_1 \cdot X_i + \sigma \cdot \varepsilon_i & & & \beta_0 = 0 & \beta_1 = \rho \frac{\sigma_Y}{\sigma_X} & \sigma = \sqrt{1 - \rho^2} \sigma_Y \end{matrix}.$$

Simplifying the variance problem: When you estimate the model coefficients in the model with an intercept term, you would expect to get an intercept estimate close to the true value of zero, which means you would also expect the estimated slope coefficients to be similar with or without the inclusion of an intercept term in the model (as you have pointed out). However, the inclusion of an intercept term will tend to reduce the variance of the estimated slope coefficient. You have:

$$X_i Y_i = \rho \frac{\sigma_Y}{\sigma_X} \cdot X_i^2 + \sqrt{1 - \rho^2} \sigma_Y \cdot X_i \varepsilon_i.$$

Defining $\boldsymbol{Z} \equiv \boldsymbol{X} / \sigma_X$ and $\boldsymbol{U} \equiv \| \boldsymbol{Z} \|^2$ allows us to write:

$$\hat{\beta}_1 = \frac{\sum_{i=1}^n X_i Y_i}{\sum_{i=1}^n X_i^2} = \rho \frac{\sigma_Y}{\sigma_X} + \sqrt{1 - \rho^2} \frac{\sigma_Y}{\sigma_X} \cdot \frac{\boldsymbol{Z} \cdot \boldsymbol{\varepsilon}}{\boldsymbol{U}}.$$

Taking the variance gives:

$$\mathbb{V}{(\hat{\beta}_1)} = (1 - \rho^2) \frac{\sigma_Y^2}{\sigma_X^2} \mathbb{V}\left(\frac{\boldsymbol{Z} \cdot \boldsymbol{\varepsilon}}{\boldsymbol{U}} \right).$$

Now, the three vectors in the variance operator are independent (using Cochran's theorem). The numerator is a sum of products of independent standard normal random variables, and the denominator is a chi-squared random variable.

I must confess that I am not sure where to go from here. I do not recognise the variance expression as any simple form, though maybe others will. In any case, I hope that gives you some progress towards what you want.

Best Answer

Related Solutions

Regression – Expected Value and Variance of Slope Parameter Estimation in Simple Linear Regression

Simple Linear Regression – How to Derive Variance of Regression Coefficient

Related Question