Since
$$\begin{align*}
\hat\beta &= (X^TX)^{-1}X^TY \\
&= (X^TX)^{-1}X^T(X\beta + \varepsilon) \\
&= \beta + (X^TX)^{-1}X^T\varepsilon
\end{align*}$$
we know that
$$\hat\beta-\beta \sim \mathcal{N}(0,\sigma^2 (X^TX)^{-1})$$
and thus we know that for each component $k$ of $\hat\beta$,
$$\hat\beta_k -\beta_k \sim \mathcal{N}(0, \sigma^2 S_{kk})$$
where $S_{kk}$ is the $k^\text{th}$ diagonal element of $(X^TX)^{-1}$.
Thus, we know that
$$z_k = \frac{\hat\beta_k -\beta_k}{\sqrt{\sigma^2 S_{kk}}} \sim \mathcal{N}(0,1).$$
Take note of the statement of the Theorem for the Distribution of an Idempotent Quadratic Form in a Standard Normal Vector (Theorem B.8 in Greene):
If $x\sim\mathcal{N}(0,I)$ and $A$ is symmetric and idempotent, then $x^TAx$ is distributed $\chi^2_{\nu}$ where $\nu$ is the rank of $A$.
Let $\hat\varepsilon$ denote the regression residual vector and let
$$M=I_n - X(X^TX)^{-1}X^T \text{,}$$
which is the residual maker matrix (i.e. $My=\hat\varepsilon$). It's easy to verify that $M$ is symmetric and idempotent.
Let
$$s^2 = \frac{\hat\varepsilon^T \hat\varepsilon}{n-p}$$
be an estimator for $\sigma^2$.
We then need to do some linear algebra. Note these three linear algebra properties:
- The rank of an idempotent matrix is its trace.
- $\operatorname{Tr}(A_1+A_2) = \operatorname{Tr}(A_1) + \operatorname{Tr}(A_2)$
- $\operatorname{Tr}(A_1A_2) = \operatorname{Tr}(A_2A_1)$ if $A_1$ is $n_1 \times n_2$ and $A_2$ is $n_2 \times n_1$ (this property is critical for the below to work)
So
$$\begin{align*}
\operatorname{rank}(M) = \operatorname{Tr}(M) &= \operatorname{Tr}(I_n - X(X^TX)^{-1}X^T) \\
&= \operatorname{Tr}(I_n) - \operatorname{Tr}\left( X(X^TX)^{-1}X^T) \right) \\
&= \operatorname{Tr}(I_n) - \operatorname{Tr}\left( (X^TX)^{-1}X^TX) \right) \\
&= \operatorname{Tr}(I_n) - \operatorname{Tr}(I_p) \\
&=n-p
\end{align*}$$
Then
$$\begin{align*}
V = \frac{(n-p)s^2}{\sigma^2} = \frac{\hat\varepsilon^T\hat\varepsilon}{\sigma^2} = \left(\frac{\varepsilon}{\sigma}\right)^T M \left(\frac{\varepsilon}{\sigma}\right).
\end{align*}$$
Applying the Theorem for the Distribution of an Idempotent Quadratic Form in a Standard Normal Vector (stated above), we know that $V \sim \chi^2_{n-p}$.
Since you assumed that $\varepsilon$ is normally distributed, then $\hat\beta$ is independent of $\hat\varepsilon$, and since $s^2$ is a function of $\hat\varepsilon$, then $s^2$ is also independent of $\hat\beta$. Thus, $z_k$ and $V$ are independent of each other.
Then,
$$\begin{align*}
t_k = \frac{z_k}{\sqrt{V/(n-p)}}
\end{align*}$$
is the ratio of a standard Normal distribution with the square root of a Chi-squared distribution with the same degrees of freedom (i.e. $n-p$), which is a characterization of the $t$ distribution. Therefore, the statistic $t_k$ has a $t$ distribution with $n-p$ degrees of freedom.
It can then be algebraically manipulated into a more familiar form.
$$\begin{align*}
t_k &= \frac{\frac{\hat\beta_k -\beta_k}{\sqrt{\sigma^2 S_{kk}}}}{\sqrt{\frac{(n-p)s^2}{\sigma^2}/(n-p)}} \\
&= \frac{\frac{\hat\beta_k -\beta_k}{\sqrt{S_{kk}}}}{\sqrt{s^2}} = \frac{\hat\beta_k -\beta_k}{\sqrt{s^2 S_{kk}}} \\
&= \frac{\hat\beta_k -\beta_k}{\operatorname{se}\left(\hat\beta_k \right)}
\end{align*}$$
For establishing a more general result, I am referring to the lecture notes here.
Suppose we have the multiple linear regression model
$$y=X\beta + \varepsilon$$
, where the design matrix $X$ (with non-random entries) of order $n\times k$ has full column rank and $\beta=(\beta_1,\ldots,\beta_k)^T$ is the vector of regression coefficients.
Further assume that $\varepsilon \sim N_n(0,\sigma^2 I_n)$ with $\sigma^2$ unknown, so that $y\sim N_n(X\beta,\sigma^2 I_n)$.
Under this setting, we know that the OLS estimator of $\beta$ is $$\hat\beta=(X^T X)^{-1}X^T y\sim N_k\left(\beta,\sigma^2(X^T X)^{-1}\right)$$
So, $$(y-X\hat\beta)^T X(\hat\beta-\beta)=(y^TX-y^T X)(\hat\beta-\beta)=0$$
Hence,
\begin{align}
\left\| y-X\beta \right\|^2 &=\left\|y-X\hat\beta+X\hat\beta-X\beta\right\|^2
\\&=\left\|y-X\hat\beta\right\|^2+\left\|X\hat\beta-X\beta\right\|^2
\\&=\left\|y-X\hat\beta\right\|^2+\left\|X\hat\beta\right\|^2+\left\|X\beta\right\|^2-2\beta^T X^T y
\end{align}
The pdf of $y$ now looks like
\begin{align}
f(y;\beta,\sigma^2)&=\frac{1}{(2\pi\sigma^2)^{n/2}}\exp\left[-\frac{1}{2\sigma^2}\left\| y-X\beta \right\|^2\right]
\\\\&=\frac{1}{(2\pi\sigma^2)^{n/2}}\exp\left[\frac{1}{\sigma^2}\beta^T X^T y-\frac{1}{2\sigma^2}\left(\left\|y-X\hat\beta\right\|^2+\left\|X\hat\beta\right\|^2\right)-\frac{1}{2\sigma^2}\left\|X\beta\right\|^2\right]
\end{align}
Noting that $X\hat\beta$ is a function of $X^T y$ and that the above density is a member of the exponential family, a complete sufficient statistic for $(\beta,\sigma^2)$ is given by $$T=\left(X^Ty,\left\|y-X\hat\beta\right\|^2\right)$$
Again $\hat\beta$, a function of $X^T y$, is also a function of $T$ and it is unbiased for $\beta$.
So by Lehmann-Scheffe theorem $\hat\beta$ is the UMVUE of $\beta$.
Best Answer
The OLS (Ordinary Least Squares) estimate does not depend on the distribution D, so for any distribution you can use the exact same tools as the for the normal distribution.
This just gives the OLS estimates of the parameters, it does not justify any tests or other inference that could depend on your distribution D (though the Central Limit Theorem holds for regression and for large enough sample sizes (how big depends on how non-normal D is) the normal based tests and inference will still be approximately correct.
If you want Maximum Likelihood estimation instead of OLS, then this will depend on D). The normal distribution has the advantage that OLS gives the Maximum Likelihood answer as well.