Solved – Covariance and Correlation in Regression Model

correlationcovarianceregressionself-study

I've been looking over some regression models lately and I came across one which, although similar, differs from the "standard" simple linear model. I was hoping somebody could provide some assistance with some properties that I'm confused with.

Assuming the regression form:

$y_{i} = \beta_{0} + \beta_{1}(x_{i}-\bar{x}) + \epsilon_{i}$

with expected value:

${\bf E}[y_{i}] = \hat{\beta}_{0} + \hat{\beta}_{1}(x_{i}-\bar{x})$

where $\hat{\beta}_{0} = \bar{y}$ and $\hat{\beta}_{1} = \frac{S_{XY}}{S_{XX}}$

and, from what I've worked out:

${\bf E}[\hat{\beta}_{0}] = \beta_{0}$, ${\bf E}[\hat{\beta}_{1}] = \beta_{1}$

and:

$\text{Var}(y_{i}) = \sigma^{2}$, $\text{Var}(\hat{\beta}_{0}) = \frac{\sigma^{2}}{n^{2}}$, $\text{Var}(\hat{\beta}_{1}) = \frac{\sigma^{2}}{S_{XX}}$

How can it be shown that:

(a)

$\text{Cov}(y_{i}, \hat{\beta}_{1}) = \frac{\sigma^{2}(x_{i}-\bar{x})}{\sum (x_{i}-\bar{x})^{2}}$

I know that the covariance formula is given by:

$\text{Cov}(y_{i}, \hat{\beta_{1}}) = {\bf E}[(y_{i} – {\bf E}[y_{i}])(\hat{\beta_{1}} – {\bf E}[\hat{\beta_{1}}])]$

I'm guessing that to yield this result, the covariance formula somehow becomes of the form:

$\text{Cov}(y_{i}, \hat{\beta_{1}}) = (x_{i}-\bar{x})\text{Var}(\hat{\beta}_{1})$

since this would give:

$\text{Cov}(y_{i}, \hat{\beta}_{1}) = \sigma^{2} \frac{(x_{i}-\bar{x})}{\sum (x_{i}-\bar{x})^{2}} = \frac{\sigma^{2}(x_{i}-\bar{x})}{\sum (x_{i}-\bar{x})^{2}}$

However, although, I have tried to do this, I'm confused about how to manipulate this formula to yield the desired result.

(b)

$\text{Corr}(\hat{\beta}_{0}, \hat{\beta}_{1}) = 0$

Here, I know that if it can be shown that:

$\text{Cov}(\hat{\beta_{0}}, \hat{\beta}_{1}) = 0$

it follows that:

$\text{Corr}(\hat{\beta}_{0}, \hat{\beta}_{1}) = 0$

However, as in part (a), I'm confused about how to develop the covariance formula accordingly.

Best Answer

Here are some hints:

For (a) Try substituting $y_i = \hat{\beta}_0 + \hat{\beta}_1(x_i - \bar{x}) + \epsilon_i$. So you get: \begin{equation} Cov(y_i,\hat{\beta}_1) = Cov(\hat{\beta}_0,\hat{\beta}_1) + (x_i - \bar{x})Cov(\hat{\beta}_1,\hat{\beta}_1) + Cov(\epsilon_i,\hat{\beta}_1). \end{equation} To get this to what you want, you need to figure out part (b) to show $Cov(\hat{\beta}_0,\hat{\beta}_1) = 0$. To do this, remember that asymptotically: \begin{equation} (\hat{\beta}_0,\hat{\beta}_1)^T \sim MVN_2 ( \boldsymbol{\beta}_0, I_E(\boldsymbol{\beta}_0)^{-1}), \end{equation} where $\boldsymbol{\beta}_0 = (\beta_0,\beta_1)^T$. Here $I_E(\boldsymbol{\beta}_0)$ denotes the expected Fisher information matrix, given by the expectation of the negative second derivative of the log-likelihood (hopefully you've seen this before, if not look here: http://en.wikipedia.org/wiki/Fisher_information). If you look at the off-diagonals of $I_E^{-1}$ (or equivalently $I_E$ here), you should be able to find the covariance between the two parameters, which will mean you can answer both (a) and (b). (By the way $I_E(\boldsymbol{\beta}_0) \approx I_E(\boldsymbol{\hat{\beta}})$ provided you have a decent sample size).

Hope it helps.