Solved – Gamma glm log link – what does predicted values mean

gamma distributiongeneralized linear modellink-functionlogarithm

Does the predict function in R for gamma glm with log link predict the actual values or the mean value?

There is a gamma glm model in R with log link. Using predict(model,data,type = 'response') to get predictedval val in the scale of response variable, If I use use my predicted value inside qgamma function as qgamma(0.9,shape = 1/dispersion, scale = predictedval*dispersion) what will the output of qgamma signify?

Best Answer

I was hoping someone else would expand upon my comments in a full answer, but here is mine, incase anyone needs help with a similar question.

1. Response to what does it predict

Glm ''like'' regression predicts the mean value given the independent variables. Selecting response in predict function back transforms the prediction out of the link scale (inverses the link), so the prediction is on the same scale as the dependent variable. If you "overfit" the model you can return back the exact value of y, but when modeling you're trying to generalize and approximate. It may not be the same as the arithmetic mean because the mean is estimated using maximum likelihood and is conditional on the included independent variables.

2. Response to how to "know" if the gamma distribution is right for my data

The gamma distribution is very flexible and is, in fact, a series of distributions that changes shape depending on the response (variance changes with mean). The gamma distribution requires the data to be positive definite and continuous if you fit those conditions the gamma distribution will quite often fit. Additionally, the Pearson residuals should be normally distributed and show no signs of heteroscedasticity. For more on model evaluation see: https://www.crcpress.com/Extending-the-Linear-Model-with-R-Generalized-Linear-Mixed-Effects-and/Faraway/p/book/9781498720960

3. Response to what happens when I use the qgamma function?

I am not 100% sure why the person you borrowed your sample code from is using the qgamma function. However, it looks to me like they are trying to return the 90th percent confidence limit. This is the code I use to return back 95% confidence limits on predictions.

 preds_link = predict(model, newdata = test_data,
                     type = "link",
                     se.fit = TRUE)# use the glm to make predictions, also provides  std. error
 critval <- 1.96 # critical value for approx 95% CI
 upper_ci_link <- preds_link$fit + (critval * preds_link$se.fit)# estimate upper CI for prediction on link scale
 lwr_ci_link <- preds_link$fit - (critval * preds_link$se.fit)# estimate lower CI for prediction on link scale
 fit_link <- preds_link$fit# returns fited value
 upper_ci <- model$family$linkinv(upper_ci_link)
 lwr_ci <- model$family$linkinv(lwr_ci_link)
 fit <- model$family$linkinv(fit)
 preds_link = data.frame(fit,lwr_ci,upper_ci,
                    fit_link,lwr_ci_link,
                    upper_ci_link)# puts predictions, CI in a single dataframe
 colnames(preds_link) = c("Prediction","LCL", "UCL",
                     "Link_Prediction","Link_LCL", "Link_UCL")# give variables logical names

Prediction, LCL, and UCL: Prediction and Confidence Limits on response scale (i.e., same scale as data)

Link_Prediction, Link_LCL, and Link_UCL: Prediction and Confidence Limits on link scale (i.e., log transformed)

If you wish to return back the 90th percent confidence value just change the critical value used to one corresponding to $\alpha$ = 0.10 instead of $\alpha$ = 0.05