Solved – Relationship between noise term ($\epsilon$) and MLE solution for Linear Regression Models

maximum likelihoodnoisenormal distributionregression

In Linear Regression models, given observed variables $x_1, x_2, x_3, …, x_k$, unobserved (or predicted) variable $y$, and model parameters $\beta_0, \beta_1, \beta_2, \beta_3, …, \beta_k$, it can be written as
$$
y = \beta_0 + \beta_1x_1 + \beta_2x_2 + … + \beta_kx_k + \epsilon
$$
where, $\epsilon$ is the noise term.

In vector notation the same thing can be written down as:
$$
\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \epsilon
$$

Now this can be solved using maximum-likelihood estimation over $\beta$. That is:
$$
\hat{\beta} = argmax_{\beta} \Pr(\mathbf{y}|\mathbf{X}, \boldsymbol{\beta})
$$

Now I read somewhere that

In order to specify $\Pr(\mathbf{y}|\mathbf{X}, \boldsymbol{\beta})$ mathematically, we need to make assumptions about the noise term $\epsilon$. A common assumption is that $\epsilon$ follows a Gaussian distribution with zero mean and variance $\sigma_{\epsilon}^{2}$,
$$
\epsilon \sim N(0, \sigma_{\epsilon}^{2})
$$
This implies that the conditional probability density function of the output $Y$ for a given value of the input $X = x$ is given by
$$
\Pr(y|x, \beta) = N(y | \beta_0 + \beta_1x_1 + \beta_2x_2 + … + \beta_kx_k, \sigma_{\epsilon}^{2})
$$

Now my question is, what has the distribution of $\epsilon$ (Normal distribution is this case), got to do with the distribution of the MLE?

In other words, why $\epsilon \sim N(0, \sigma_{\epsilon}^{2})$, also implies $$
\Pr(y|x, \beta) = N(y | \beta_0 + \beta_1x_1 + \beta_2x_2 + … + \beta_kx_k, \sigma_{\epsilon}^{2})
$$

Thank you in advance.

Best Answer

In other words, why $\epsilon \sim N(0, \sigma_{\epsilon}^{2})$, also implies $$ \Pr(y|x, \beta) = N(y | \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_kx_k, \sigma_{\epsilon}^{2}) $$

If $\epsilon \sim N(0, \sigma_{\epsilon}^{2})$

Then $y = X\beta + \epsilon \sim N(X\beta, \sigma_{\epsilon}^{2})$

Hence $y \sim N( \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_kx_k, \sigma_{\epsilon}^{2})$, which is basically the last line.

Note that $X\beta$ is a constant, and not a random variable in this setting, so $y$ is RV only because of $\epsilon$.

Related Question