Solved – Plotting the results of GLM in R

data visualizationgeneralized linear modelrscatterplot

I have this data plotted as a scatter plot in Excel:

I had done a regression in Excel, and the p value was 2.14E-05 while the R- value was 0.32. I was told the R value was too low compared to the significance of the p value, and was told to control for the dispersion of the data by running it through R with GLM with quasipoisson error.

This gave me

glm(formula = encno ~ temp, family = quasipoisson(link = log), 
    data = encnotemp)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-6.008  -2.431  -1.021   1.353   9.441  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2.005807   0.174628  11.486  < 2e-16 ***
temp        0.029065   0.006528   4.453 1.53e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasipoisson family taken to be 10.19898)

    Null deviance: 1807.4  on 171  degrees of freedom
Residual deviance: 1620.1  on 170  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 5

How do I analyse this output?

The problem is that the scatterplot data is too dispersed, and I would like to make a scatterplot from the quasipoisson GLM output that shows less dispersed (more fitted) data points. Will this be possible?

Best Answer

"I was told the R value was too low compared to the significance of the p value" -- sounds like nonsense to me.

On the other hand, some form of glm may be a good idea (but it looks to me like the spread may be increasing more than you might expect with a quasipoisson).

Note that nothing about the glm changes the spread of the data -- it only models changing spread (in a particular way). The data are still the data and if you plot them will still look as they do.

You can change the appearance of the data via a transformation. One that approximately stabilizes variance when the Poisson parameter is not very small is $\sqrt{y}$. If the Poisson parameter can take small values, you may like to try $\sqrt{y+\frac{3}{8}}$ or $\sqrt{y}+\sqrt{y+1}$ instead (it looks to me like that might well be the case that you have small values).

On the other hand, one that would linearize your fitted model would be a log (but that's only suitable if you don't have exact zeros).

Although it won't be satisfying to you, you can plot the fitted curve via

plot(temp,encno,xlim=c(0,60))
newdat <- data.frame(temp=seq(9,48,.5))
encnoglm1 <- glm(formula = encno ~ temp, family = quasipoisson(link = log), 
                     data = encnotemp)
fit <- predict(encnoglm1,newdata=newdat,type="response")
lines(fit~temp,data=newdat,type="l",col=4)

Or if you want to look at what would be a nearly constant variance if the quasipoisson were suitable:

 plot(temp,sqrt(encno+3/8),xlim=c(0,60))
 lines(sqrt(fit+3/8)~temp,data=newdat,type="l",col=2)

Related Solutions

Solved – Overdispersion in poisson glm

The Poisson model assumes equal mean and variance. If that doesn't hold, then the Poisson model isn't correct. Quasi-poisson is one possibility when there is overdispersion. Others include: Negative binomial regression (NBR) - similar to Poisson model, but using the negative binomial distribution instead, which has a dispersion parameter. Available in the MASS package in R, also integrated into Stata. Hurdle regression - for circumstances with more 0s than would be expected from the Poisson/NB model. It combines a logit/probit with Poisson/NB, where the logit/probit is used to estimate y=0 vs y>0, and a truncated Poisson/NB is used to estimate the cases where y>0. Available in the pscl package. Also available as a separate Stata add-on I cannot remember. Zero-inflated - zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZNB) models are similar to the hurdle model, but assume two mechanisms at work to generate 0s: never-takers, and potential takers who didn't take in this instance. Available in the pscl package, and available in Stata.

If your data are over-dispersed, try NBR, and compare the log-likelihoods (e.g. via AIC/BIC, etc.), and you can also get the statistical significance of the dispersion parameter from the NBR. From what I can tell, there is no disadvantage of using the NBR model relative to the Poisson model - so when I have overdispersion (in my limited experience, this has always been the case!), I simply use NBR and don't think twice. I could be wrong and would welcome others' thoughts. One of the downsides to the quasi-poisson is that it doesn't allow you to get likelihood-based stats, like the AIC/BIC. NBR uses MLE so it does.

This is a wonderful reference walking through how these models are used; it's an R vignette, but even if you don't use R it should be very useful.

Solved – How to predict & plot quasipoisson GLM in R

From my research, it seems like both ways could be 'appropriate' for illustration, but many would probably agree that the second graph, with the plot on the scale of the response variable, is more intuitive to understand. I found this description of interpreting Poisson regressions to be helpful.

That document states that the equation would look like this for a single covariate Poisson model: ln(yi) = β0 + β1xi. This is equivalent to yi = e^(β0 + β1xi).

In my output, β0 is the intercept at 5.489, and β1 is the coefficient at -0.0027.

To determine what the mean value of y is at the intercept of x=0, I take e^β0, which is:

> exp(5.489465)
[1] 242.1276

The result of increasing x by 1 unit, has multiplicative effect on the mean of the Poisson by e^β1. So, to figure out what that is, I take the coefficient of -.0027, and do the same, e^-.0027:

> exp(-0.002744)
[1] 0.9972598

To get the value of y at x=1, I take the value of the intercept (242.1276) and multiply it by 0.997 to get the value of x=1.

> 242.1276 * 0.9972598
[1] 241.4641

The value at x=2 takes that value from x=1, (241.46), and multiplies by .997 again, equaling 240.802477. The prediction line simply does this along a list of x values multiplying the value of y at x-1 by .997.

An alternative way to understand and create this predictor line is to take the values of the linear plot (the first plot in the question) and compute the exponential of the value of y at any point along the line.

I have not yet figured out the issue with the negative binomial regression and plot, but I think this will suffice for my purposes.

Best Answer

Related Solutions

Solved – Overdispersion in poisson glm

Solved – How to predict & plot quasipoisson GLM in R

Related Question