I have some count data on woodland species which appears to be fairly normally distributed (See histogram below). I am fitting it as the response variable in a glm with multiple explanatory variables. As the distribution appears normal, in this instance is it better to use a Gaussian family for the glm despite it being count data? I have looked at similar questions but none are really asking the same thing as this.
Distributions – Using Poisson Distribution for Normally Distributed Count Data
count-datadistributionsgeneralized linear modelnormal distribution
Best Answer
You appear to be looking at the marginal distribution of the counts. Only the conditional distribution matters (although addressing the converse situation, see my answer to: What if residuals are normally distributed, but y is not?). To assess this properly, fit a model and look at the residuals.
There are several issues with fitting an incorrect type of model to data (e.g., an OLS regression for count data):
All in all, it isn't clear you should use OLS based on what you've presented here. It may be acceptable, or may be possible to make it acceptable, but you'll have to check and think carefully about the results, and your situation and goals.