GLM to normal distribution

generalized linear modelglmmnormal distributionrregression

I have a dataset with four variables: Body temperature (dependent variable), air temperature, substrate temperature, precipitation and relative humidity (independent variables). To test whether my independent variables affect body temperature, I thought about using a GLM, but I'm uncertain whether this would be the most appropriate procedure, since body temperature has a normal distribution. Would this really be a barrier to using a GLM in this situation? Would a Mixed Model be more appropriate?

Best Answer

There seems to be some confusion.

A generalised linear model is indicated when the response variable is a count (or otherwise discreet variable), or if it is continuous, when the conditional distribution of the response (that is, conditional on the covariates), follows a non-normal distribution. A common example is where the response assumes values only within a interval - eg, probabilities, which are bounded by [0, 1] are commonly modelled using beta regression, since the beta distribution is defined on the interval [0, 1]. In general, for a generalised linear model we have:

$$ \text{link}\bigg(\mathbb E\big[Y\vert X\big]\bigg) = X\beta $$ where $X$ is the model matrix of fixed effects and $\beta$ is the parameter vector.

So, if the link function is the identity function, and the response distribution is the normal distribution, this is exactly the same as multivariable linear regression:

$$ \mathbb E \big[Y\vert X\big] = X\beta $$

In your case, you appear to have a continuous response, so unless there is some underlying thoery that suggests a GLM such as a gamma model is indicated, I would start with a multivariable linear regression. In R, we would have:

lm(body_temp ~ air_temp + substrate_temp + precip + rel_hum, data = mydata)

[Of course you may want to allow for nonlinearities by fitting interactions and/or nonlinear terms, as indicated by the underlying theory]

A mixed model is also mentioned. This would be indicated when there are repeated measures, or some other kind of clustering, when observation within one cluster are more similar to each other than observations within different clusters. In such a situation we often fit random intercepts to account for this. A model with fixed effects, and random effects, is known as a mixed model.

Related Question