Here are the short answers to your questions:
The regression model allows you use the structure of the model to estimate the mean at predictor values by plugging the value into the fitted model equation. You can even do this for predictor values you didn't directly observe in the data set (more about this below).
The difference is that you're calculating the expected value (mean) conditional on the predictor values and random effects, which can be quite different from the mean unconditional on the predictor values (which I assume is what you mean by the means calculated by hand), if the predictors truly are important in predicting the response.
Here are the long answers to your questions:
Specifically, in the generalized linear mixed model you must be dealing with data that has some clustering in it, so let $Y_{ij}$ be the outcome variable for individual $i$ in cluster $j$ with corresponding predictor values ${\bf X}_{ij}$. The basic generalized linear mixed model with a random intercept is
$$ \varphi \left( E(Y_{ij}|{\bf X}_{ij}, \eta_j) \right) = \alpha + {\bf X}_{ij}{\boldsymbol \beta} + \eta_j $$
where $\eta_j \sim N(0,\sigma^{2}_{\eta})$ is the cluster $j$ random effect and $\varepsilon \sim N(0,\sigma^{2}_{\varepsilon})$ is the leftover error term and $\varphi(\cdot)$ is the link function. The link function is used to express the mean of the outcome variable as something that is linear in the predictors/random effects and is often used to change a bounded value (e.g. in (0,1)) to an unbounded range. The link function depends on the type of GLM. Some examples:
$\varphi(x) = x$ is the identity link, in which case you have a linear model (this appears to be the case in the example you've mentioned)
$\varphi(x) = \log \left( \frac{x}{1-x} \right)$ is the logistic link, used for logistic GLMMs.
$\varphi(x) = \log(x)$ is the log link, used for Poisson GLMMs.
$\varphi(x) = \Phi^{-1}(x)$ is the probit link, used for Probit GLMMS.
To answer question (1): Now, if we have estimates of $\alpha$ and $\boldsymbol \beta$, then we can estimate the conditional mean of $Y_{ij}$, given the random effects and the predictor values:
$$ \widehat{E}(Y_{ij} | {\boldsymbol X}_{ij},\eta_j) = \varphi^{-1}(\hat \alpha + {\bf X}_{ij}\hat {\boldsymbol \beta} + \eta_j)$$
In your example, the times are covariates (i.e. ${\bf X}_{ij}$s) in the model which are plugged into the regression equation to estimate the monthly means.
One useful thing this allows you to do is to estimate the mean for a predictor value you didn't directly observe. That is,
the (linear) structure imposed by the model allows you to estimate $E(Y|X)$ for values of $X$ you didn't observe, by linear interpolation - when the linear model is a good fit this is a very nice bonus, since you wouldn't be able to calculate this quantity by hand without the model.
Note: One should exercise extreme caution when plugging in predictor values that go outside of the range of the data used to fit the model. This is called extrapolation and can perform arbitrarily poorly if the model does not apply outside of the range of the observed data.
The equation above can also be used to estimate relative changes in an individual's average outcome for different predictor values. Often times this relative change doesn't even depend on the unobserved random effect. For example, when you use the log link:
$$\frac{ \widehat{E}(Y_{ij} | {\boldsymbol X}_{ij},\eta_j) }
{ \widehat{E}(Y_{ij} | {\boldsymbol X}'_{ij},\eta_j) }
= \frac{ \exp (\hat \alpha + {\bf X}_{ij}\hat {\boldsymbol \beta} + \eta_j)}{ \exp (\hat \alpha + {\bf X}'_{ij}\hat {\boldsymbol \beta} + \eta_j)} = \exp \left( ({\bf X}_{ij}-{\bf X}'_{ij})\hat {\boldsymbol \beta} \right) $$
Note: The mean unconditional on the random effect is the average over the random effects:
$$ E(Y_{ij} | {\boldsymbol X}_{ij}) = E_{\eta} \left( E(Y_{ij} | {\boldsymbol X}_{ij},\eta_j) \right) $$
which is not trivially related to the conditional expectation (since it is an intergral of the conditional expectation) except the identity link is used (i.e. you have a linear model). Some remarks about the mean unconditional on the random effects:
In a linear mixed effects model, the mean unconditional on the random effects is the same as the conditional mean - this is fairly clear when you plug in the identity link and use the fact that the random effects have mean 0. This fact does not always hold in the case with non-linear models (see here for more discussion).
The change in the conditional mean as a function of the predictors (but unconditional on the random effects) is related to the average change in the population for a change in the predictor value.
The change in the condition mean when the random effects are conditioned on is related to the change in the expected value for a particular individual for a change in the predictor value. More on this can be read about in the link above.
To answer question (2): When you calculate the sample mean you're effectively marginalizing over the predictor values and throwing out the information they provide. That is, assuming your ${\bf X}$s are a random sample (that is, your data are not from a retrospective study or something else like that), then the sample mean estimates
$$ E(Y_{ij}) = E_{\bf X} \left( E(Y_{ij} | {\bf X}_{ij} ) \right) $$
If ${\bf X}_{ij}$ truly does have an effect, then $E(Y_{ij} | {\bf X}_{ij} )$ can be very different from $E(Y_{ij})$ and that's the explanation for what you're seeing.
Note that if you have categorical predictors, then you can estimate the conditional means by stratifying the data set and calculating the mean within each strata, but this is exactly what the regression model does. So, in any case, you're no worse off using the regression estimates of the mean, as long as the model fits reasonably well.
I think trying to think of this as a generalized linear model is overkill. What you have is a plain old regression model. More specifically, because you have some categorical explanatory variables, and a continuous EV, but no interactions between them, this could also be called a classic ANCOVA.
I would say that #3 is not really an assumption here that you need to worry about. Nor, for that matter, do you need to really worry about #2. Instead, I would supplant these with two different assumptions:
2'. Homogeneity of variance
3'. Normality of residuals
Furthermore, #4 is an important thing to check, but I don't really think of it as an assumption per se. Lets think about how assumptions can be checked.
Independence is often 'checked' firstly by thinking about what the data stand for and how they were collected. In addition, it can be checked using things like a runs test, Durbin-Watson test, or examining the pattern of autocorrelations--you can also look at partial autocorrelations. (Note that, these can only be assessed relative to your continuous covariate.)
With primarily categorical explanatory variables, homogeneity of variance can be checked by calculating the variance at each level of your factors. Having computed these, there are several tests used to check if they are about the same, primarily Levene's test, but also the Brown-Forsyth test. The $F_{max}$ test, also called Hartley's test is not recommended; if you would like a little more information about that I discuss it here. (Note that these tests can be applied to your categorical covariates unlike above.) For a continuous EV, I like to just plot my residuals against the continuous covariate and examine them visually to see if they spread out further to one side or the other.
The normality of the residuals can be assessed via some tests, like the Shapiro-Wilk, or the Kolmogorov-Smirnov tests, but is often best assessed visually via a qq-plot. (Note that this assumption is generally the least important of the set; if it is not met, your beta estimates will still be unbiased, but your p-values will be inaccurate.)
There are several ways to assess the influence of your individual observations. It is possible to get numerical values that index this, but my favorite way, if you can do it, is to jackknife your data. That is, you drop each data point in turn and re-fit your model. Then you can examine how much your betas bounce around if that observation were not a part of your dataset. This measure is called dfbeta. This requires a bit of programming, but there are standard ways that software can often compute for you automatically. These include leverage and Cook's distance.
Regarding your question as originally stated, if you want to know more about link functions and the generalized linear model, I discussed that fairly extensively here. Basically, the most important thing to consider in order to select an appropriate link function is the nature of your response distribution; since you believe $Y$ is Gaussian, the identity link is appropriate, and you can just think of this situation using standard ideas about regression models.
Concerning the "correct scale of measurement of explanatory variables", I take you to be referring to Steven's levels of measurement (i.e., categorical, ordinal, interval & ratio). The first thing to realize is that regression methods (including GLiM's) do not make assumptions about the explanatory variables, instead, the manner in which you use your explanatory variables in your model reflects your beliefs about them. Furthermore, I tend to think Steven's levels are overplayed; for a more theoretical treatment of that topic, see here.
Best Answer
It sounds like you are proposing essentially a two-stage least squares, where stage one reduces each cluster to its standard deviation about a cluster-specific mean. This seems fine, although note that you could actually model on the observational level, ie, let the variance for each observation be a linear function of covariates. Note that I don't know of any off-the-shelf software that would allow for exactly that.
Returning to the two-stage approach, if cluster $i=1,...,N$ are normally distributed, eg $Z_i \sim N(\mu_i, \rho^2_i)$ then the sample variances will be scale chi-square distributed with $N_i -1$ degrees of freedom. Letting $S^2_i$ denote the sample variance in cluster $i$, then $$S^2_i \sim \frac{\rho^2_i}{N_i-1} \times \chi^2(N_i-1).$$
In more detail, we have that \begin{align*} E S^2_i & = \rho^2_i, \\ Var S^2_i & = 2\frac{\rho_i^4}{N_i - 1}. \end{align*}
A gamma GLM assumes that $Var Y = \phi (E Y)^2$, so this might actually be a case for gamma regression, with an identity link! (Which is a first for me, I think.) If the $N_i$ differ very much, then you need precision weights $1/(N_i-1)$.