Solved – Generalized linear model: link function Power(-1)

gamma distributiongeneralized linear modellink-functionspss

During study our of statistics in my psychology coursework, we had to teach ourselves how to use generalized linear models in SPSS (only basic knowledge). For an exam we may also use generalized linear models and I want to try it, but I have two problems:

  1. According to the distribution of the dependent variable, a normal distribution would fit our data best. A fellow student said that this is not logical, because our dependent variable is "time required to complete a task", so it cannot have negative values. Our lecturer wrote on his slides that a variable needs to be able to range from $-∞$ to $+∞$ to use the normal distribution.

    In another post in these sides, I read that the normal distribution creates a predicted variable which fits these characteristics, so the original dependent variable doesn't need to fulfill those characteristics. Is that true?

  2. As I can't use the normal distribution, a Gamma distribution would fit best. According to SPSS's documentation, the canonical link function is Power(-1). I don't quite understand what I have to put in the box "Power" in SPSS. Do I have to fill in 0,05 if I want to work with a Power of 0,95 (so Power(-1) would mean 1-0,95) or do I have to fill in -1 as value of power (which seems quite unusual to me).

Best Answer

There are a few different concepts here.

First, no real dataset follows the normal distribution, but many are "close enough". If the observed times in your data are not close to 0 then an approximate normal may be good enough. If there are values near 0, or you want to make predictions where the conditions may push the normal model into making predictions below 0, then the normal is not appropriate and something like the gamma would be better. But for gamma distributions with small standard deviation relative to the mean, the normal is often a reasonable approximation.

When the documentation refers to Power(-1) it is not the same concept of "power" meaning the probability of rejecting a false null hypothesis. Rather, it is saying that the default for gamma glms is to model the data in terms of the reciprocal ($\frac1y$) of the response variable. This can make sense for time data. Instead of modeling time to finish the task you are modeling amount of task finished in 1 unit of time.