With GLMs, it's generally best not to think of them as "conditional mean + error" -like models but as "conditional distribution" models.
In the case of the Gamma model, note that the variance is proportional to the square of the mean. If you really want to write an error term and you have a log link, you can either write it as an additive error model on the log-scale (with constant variance) or on the original scale as a multiplicative error model (but with changing variance). I wouldn't do it as an additive model on the original scale.
The log of a gamma random variable isn't at all bad to deal with, so the additive error version is kind of convenient, if you want to deal with an error-model.
Beware, however -- the additive error version results in a term with a non-zero mean (it's also left skew, but that's less of a big deal). It's easy enough to compute an adjustment for that non-zero mean, though, so that you can correct for bias on the log-scale (or you could even fit a least-squares model to the logs and compute an adjustment for the original scale).
It seems that the "quasi-likelihood" must be applied in order to solve the models for lognormal models without log-transform.
In fact, if you want ML estimation, taking logs is basically the most sensible way to estimate the parameters of that log-normal model.
To try to do that on the original scale is just making your life hard.
See here, starting at "The MLE is also invariant with respect to certain transformations of the data." down to "For example, the MLE parameters of the log-normal distribution are the same as those of the normal distribution fitted to the logarithm of the data."
Note that lognormal and gamma models aren't the only models which would be suitable for a model that's linear in the logs, and which has constant variance on the log-scale, they just happen to both be quite convenient.
Best Answer
The
family
describes the response. Its what makes a glm different to a standard plain linear model.In a linear model your responses (y-values) are considered to be Normally-distributed values with mean aX+b and variance sigma-squared. In a GLM of another family the responses are thought to come from another distribution - such as poisson, or binomial, via a transformation of the aX+b bit. If your response values are small numbers of counts (classic example: number of cars past a post every minute on a quiet road) then the linear model is inappropriate because it might end up predicting a negative number of cars... So you use a poisson regression, where the distribution is non-negative.
This affects how you think about residuals, and the GLM code in R can return various types of residuals - see help(residuals.glm) for details. For example the size of the residuals in your plot, together with you telling us that the data is in the thousands, makes me think these are 'response' residuals, and not scaled in the way some of the other residuals are.
You don't think the blue line in your plot is the 'distribution of your residuals' do you? You should maybe have a look at the histogram or a boxplot or quantiles of your residuals.
Fitted vs resid plots like you've given us tell is some things: for example if you see a strong pattern of high, low, high residuals then that's saying there's some more curviness in your data that your model hasn't found (because your residuals aren't independent); if your residuals are all small at low fitted values and then large at high fitted values (but all balanced around zero) then your variance isn't constant and you should think carefully about a transformation of some kind.