Solved – Fitting a Generalized Linear Model (GLM) in R

generalized linear modellink-functionr

I am learning about Generalized Linear Models and the use of the R statistical package, but, unfortunately, I am unable to understand some fundamental concepts.

I am trying to develop a GLM – Poisson model but using a specific log link function. The function is of the form

$$\ln(E(y_i)) = \ln(\beta_1) + \beta_2 \ln(\text{exp}_1) + \beta_3 \ln(\text{exp}_2).$$

In this equation, $\text{exp}_1$ and $\text{exp}_2$ are measures of exposure in the model. From my understanding, in R, I would first load all the data and ensure it was properly set-up. I then believe I should be running:

model = glm(formula = Y~exp1+exp2, family=poisson(link="log"),data=CSV_table)

As I am new to GLMs and R, I am not exactly sure what specifying poisson(link="log") does. I hope this question isn't too trivial. I have been trying to google clear concise explanations online for hours; however many answers/links assume a level of knowledge higher than mine.

Best Answer

There are three components to the GLM: an outcome variable, a linear predictor and a link function. The link function in the GLM relates the expected value of the outcome variable to the linear predictor. In other words, not the expected value itself, but a function of it is modeled by the linear predictor. An example with the logarithm as the link function and the linear predictor $\beta_0 + \beta_1*x$ is:

$$\log(E(y)) = \beta_0 + \beta_1*x$$

In your case, the linear predictor is $\log(\beta_0) + \beta_1*\log({\rm exp}_1) + \beta_2*\log({\rm exp}_2)$. So the equation for your model becomes:

$$\log(E(y)) = \log(\beta_0) + \beta_1*\log({\rm exp}_1) + \beta_2*\log({\rm exp}_2)$$

I think this is a bit weird and I would argue that possibly that's not the model you are supposed to fit. Anyway, to fit this model with R, the code should look like this:

model <- glm(formula = Y ~ log(exp1) + log(exp2), family = poisson(link="log"), 
             data = CSV_table)

The only thing you have to take care of after running the model is to take the exponential function of the intercept, if you want to write the intercept as a log. A good book if you want to learn about the GLM and categorical data analysis in general is the one by Agresti (2007).

References:

Agresti, A. (1996). An introduction to categorical data analysis (Vol. 135). New York: Wiley.