Solved – Variance vs Standard Deviation vs SE vs Var(beta hat)

multiple regressionvariance

How I understand it:

Sum of squares is:

$$\newcommand{\ybar}{\bar y} (y_i – \ybar)^2$$

Variance is:

$$\frac{(y_i – \ybar)^2}{n}$$

When variance is from a sample

$$\frac{(y_i – \ybar)^2}{n -1}$$

Standard deviation is square root of the variance

$$\sqrt{\frac{(y_i – \ybar)^2}{n}}$$

Sample standard deviation is square root of the sample variance

$$\sqrt{\frac{(y_i – \ybar)^2}{n -1}}$$

Is this correct?

Now standard error is where I start to get a bit confused

$SE = \frac{s}{\sqrt{n}}$ where $s$ is the sample standard deviation which is $\sqrt{\frac{(y_i – \ybar)^2}{n -1}}$

So why is standard error the standard deviation divided by square root $n$?

Is $s$ as I have defined it?

My next question is for multiple regressions it talks about finding the variance (beta-hat). I don't even know what that is talking about as there are multiple parameters I am guessing that it is using beta-hat as a substitute for all parameters in the model?

I don't understand the following:

  • It says that the estimate is the "variance of the unknown errors"
  • multiplied by the identity matrix

It says that it is equal to $\frac{e'e}{n-p} = \frac{SE}{n-p}$

So I assume that this is some sort of derivation of the variance formula for multiple regression parameters.

Is the $p$ a replacement of the "1" in the other sample formula? What is $Se$ in the above case?

What is "variance of the unknown errors"?

When I check the SSE for my model by using the following R commands:

SSE <-sum(resid(model1)^2)
n   <-length (resid(model1))
p   <- length(coef(model1))

SSE/(n-p)

So this is saying that it is the sum of square error divided by $n-p$ , which is closest to $s^2$ or the variance but the formula is slightly different, so what exactly is going on here?

Best Answer

I do not have the time to give a detailed answer, but since no one has helped you so far, I will give some hints.

Is this correct? and Is s as I have defined it?

Your formulas seem to be fine, they just lack the summations, as noted by @Gilles.

So why is standard error the standard deviation divided by square root N?

Standard error is useful for significance testing. It is used in $t$-tests. Without the division by $\sqrt{N}$ the $t$-tests would not work. You should be able to find more information in econometrics textbooks.

I am guessing that it is using beta-hat as a substitute for all parameters in the model?

That could very well be the case. $\hat{\beta}$ would be a parameter vector rather than a single parameter. Its variance would be a matrix rather than a single number.

"variance of the unknown errors" <--- don't know what that means

In a linear regression model of the form $y=\beta_0+\beta_1 x_1+\dotsb+\beta_K x_K+\varepsilon$, variance of the unknown errors would be the variance of $\varepsilon$, denoted $\sigma^2$; it is a single number (not a matrix). It can be estimated as the sample variance of the model residuals, denoted $\hat{\sigma}^2$.

However, the variance of the linear regression parameter vector $\hat{\beta}$ is $\text{Var}(\hat{\beta}|X)=\sigma^2 (X^T X)^{-1}$, which is not the variance of the unknown errors multiplied by the indentity matrix.

A note on what you did not ask: if you want to really understant what is going on in the OLS estimation applied on a linear regression model, you should take your time and study the subject carefully and with patience. Just getting your questions answered like here may not be useful in the long run.

Related Question