This is pretty common notation:
Taken together, you have an $AR(q)-GARCH(p,q)$ model there (with slight abuse of notation as we have $q$ twice).
But is the one step ahead predictor not already defined as the value $\hat \sigma$ of the volatility that minimizes the MSE?
If you estimate the GARCH model using maximum likelihood then the fitted values $\hat\sigma_t$ are the likelihood-maximizing values (subject to the GARCH(1,1) functional form) which need not coincide with MSE-minimizing values. That depends on the distribution assumed for the likelihood calculation.
Also, when fitting the model on a data sample indexed from $1$ to $T$, the fitted value $\hat\sigma^2_t$ for $t<T$ utilizes the information not only from $1\leqslant\tau<t$ but also from $t\leqslant\tau\leqslant T$. Meanwhile, to fairly evaluate the forecast performance, you should not allow the estimate to be calculated using future data; instead, you need to check the forecast accuracy out of sample.
So why compute this measure if it going to be the minimum across models anyway?
Because you may want to know what the actual values of MSE is. Knowing that the MSE is minimal does not tell you what its value is. (See also the answer to the previous question.)
Moreover do I understand correctly that the practical way to compute the $MSE$...
That will be the mean squared forecast error. If you want in-sample MSE, just use the fitted values from the model estimated on the whole sample. The former should give an unbiased estimate of model performance, while the latter will be too optimistic in that respect.
Also I understand from this paper (Bollerslev 1998) that utilizing the squared daily return to approximate the realized volatility leads to noise.
Very important point indeed! The response variable of the GARCH model is measured with noise when squared errors are used as proxies; this noise may be quite substantial. When trying to assess model fit, the measurement error associated with the dependent variable may cause quite some trouble.
For example a better estimate of realized daily volatility would be the sum of 30 minutes squared returns of that day.
On the first thought, that could be a valid option. If that was proposed in the Andersen and Bollerslev (1998) paper, then it must be fine.
So to get a "better" MSE I could substitute every σi with the sum of 30 minutes squared returns of that day instead of simply the daily squared return? Is this how you use realized volatility to evaluate the goodness of your forecasts?
That makes sense to me.
I will end this rambling by asking for a good reference in evaluating the accuracy of the forecasts using realized volatility...
Regarding references, I think the Andersen and Bollerslev (1998) paper is quite relevant and complete. Also see Patton & Sheppard "Evaluating volatility and correlation forecasts" (2009) (free version here) and other works by Patton.
Best Answer
It seems you want to measure the true underlying variance (volatility) to be able to assess how well this volatility is captured by your models. The problem is that volatility is not directly observed and you do not have other (high-frequency) data from which you could create a good proxy for the daily volatility (such as using realized variance from the high-frequency data to proxy for the true variance of any given day in your daily data).
As far as I know, there is no solution to your problem. That is, technically it is impossible to measure the underlying daily variance when you only have one observation per day. You have a daily series of $T$ days where the variance may be different on each day and there may be no connection between the variance on day $t$ and the variance on day $s$ for any $(t,s)\in 1, \dots, T$ where $t\neq s$. Thus effectively you can treat your series as $T$ separate random variables, unless you are willing to assume some structure that connects them. And it is clearly impossible to measure the variance of a random variable from just one realization of that variable, without making further assumptions.
If you were willing to make some assumptions, you could start from assuming that the expectation is zero on each day. Then you would be technically able to calculate the sample variance which would be, on any day $t$, $$ \widehat{Var}_t(x_t) = (x_t-0)^2 = x_t^2. $$ This would be a very noisy measure of variance, so it would not be a "good" proxy for the true variance. Unfortunately, you can only get this far without making further assumptions or collecting data at a higher frequency.