We have an equation with $\alpha =0$ therefore the simple linear regression is: $$ Y_i=\beta X_i+u_i$$ and the question was. Derive the OLS $\beta $ and provide comments on its variance. In this situation should I have derived the OLS the same way as for a regression with intercept? to obtain $$ \hat\beta =\frac{Cov(XY)}{Var(X)} $$ or is it something different that I should have done?
Solved – Is the derivations of OLS different if we do not have an intercept
least squaresregressionself-study
Related Solutions
The estimator for the variance commonly used in regression does not come from the least squares principle, which only produces an estimate for $\boldsymbol{\beta}$. It is just a bias-corrected version (by the factor $\frac{n}{n-K})$ of the empirical variance
$$\widehat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^n \left(y_i - \mathbf{x}_i^{T} \widehat{\boldsymbol{\beta}} \right)^2$$ which in turn is the maximum likelihood estimator for $\sigma^2$ under the assumption of a normal distribution. It's confusing that many people claim that that is the OLS estimator of the variance.
This is one of those theorems that is easier to prove in greater generality using vector algebra than it is to prove with scalar algebra. To do this, consider the multiple linear regression model $\mathbf{Y} = \mathbf{x} \boldsymbol{\beta} + \boldsymbol{\varepsilon}$ and consider the general linear estimator:
$$\hat{\boldsymbol{\beta}}_\mathbf{A} = \hat{\boldsymbol{\beta}}_\text{OLS} + \mathbf{A} \mathbf{Y} = [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}] \mathbf{Y}.$$
Since the OLS estimator is unbiased and $\mathbb{E}(\mathbf{Y}) = \mathbf{x} \boldsymbol{\beta}$ this general linear estimator has bias:
$$\begin{align} \text{Bias}(\hat{\boldsymbol{\beta}}_\mathbf{A}, \boldsymbol{\beta}) &\equiv \mathbb{E}(\hat{\boldsymbol{\beta}}_\mathbf{A}) - \boldsymbol{\beta} \\[6pt] &= \mathbb{E}(\hat{\boldsymbol{\beta}}_\text{OLS} + \mathbf{A} \mathbf{Y}) - \boldsymbol{\beta} \\[6pt] &= \boldsymbol{\beta} + \mathbf{A} \mathbf{x} \boldsymbol{\beta} - \boldsymbol{\beta} \\[6pt] &= \mathbf{A} \mathbf{x} \boldsymbol{\beta}, \\[6pt] \end{align}$$
and so the requirement of unbiasedness imposes the restriction that $\mathbf{A} \mathbf{x} = \mathbf{0}$. The variance of the general linear estimator is:
$$\begin{align} \mathbb{V}(\hat{\boldsymbol{\beta}}_\mathbf{A}) &= \mathbb{V}(\mathbf{A} \mathbf{Y}) \\[6pt] &= [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}] \mathbb{V}(\mathbf{Y}) [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}]^\text{T} \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}] [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}]^\text{T} \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}] [\mathbf{x} (\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A}^\text{T}] \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \mathbf{x} (\mathbf{x}^\text{T} \mathbf{x})^{-1} + (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \mathbf{A}^\text{T} + \mathbf{A} \mathbf{x} (\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A} \mathbf{A}^\text{T}] \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} + (\mathbf{x}^\text{T} \mathbf{x})^{-1} (\mathbf{A} \mathbf{x})^\text{T} + (\mathbf{A} \mathbf{x}) (\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A} \mathbf{A}^\text{T}] \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} + (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{0}^\text{T} + \mathbf{0} (\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A} \mathbf{A}^\text{T}] \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A} \mathbf{A}^\text{T}]. \\[6pt] \end{align}$$
Hence, we have:
$$\mathbb{V}(\hat{\boldsymbol{\beta}}_\mathbf{A}) - \mathbb{V}(\hat{\boldsymbol{\beta}}_\text{OLS}) = \sigma^2 \mathbf{A} \mathbf{A}^\text{T}.$$
Now, since $\mathbf{A} \mathbf{A}^\text{T}$ is a positive definite matrix, we can see that the variance of the general linear estimator is minimised when $\mathbf{A} = \mathbf{0}$, which yields the OLS estimator.
Best Answer
That is not correct if there is no intercept. Without an intercept the OLS estimate would be:
$$ \beta = \frac{\operatorname{E}[XY]}{\operatorname{E}{[X^2]}}$$
In the case of a finite sample, your estimate would be:
$$ \hat{\beta} = \frac{\sum_{i=1}^n x_i y_i }{\sum_{i=1}^n x_i^2} $$
The general case (skip over this if you don't know matrix algebra yet)
If you know matrix algebra, all these are special cases. Minimizing the sum of squares can be written as:
$$ \text{minimize (over $\mathbf{b}$) } \quad\left( \mathbf{y} - X \mathbf{b}\right)'\left( \mathbf{y} - X \mathbf{b}\right) $$
Which has the solution:
$$ \hat{\mathbf{b}} = (X'X)^{-1} X'\mathbf{y}$$
The algebra behind that can be found (among numerous places) on my answer here.