I read somewhere that you could compute a "residual value" for a GLM by taking the actual values of your response variable divided by the predicted value of that response variable.
For example, suppose the response variable y represents number of cars, and $x_1$ represents the age of. a car. We would fit a glm model and calculate the "residual value", denoted residual
below, for every observation in our data set with something like the following in R:
library(dplyr)
m <- glm(y~x_1,data=dataset, family=poisson(link='log'))
dataset <- dataset %>% mutate(pred_value = predict(m,type='response))
dataset %>% mutate(residual = y/pred_value)
I'm wondering if that actually makes sense, since unlike linear regression the GLM equation generally doesn't contain a residual term in it.
If not, what would be the best way to compare predicted versus expected values? The goal is the see if one can derive a 2nd predictor from the noise not modeled by the GLM model.
Best Answer
For a poisson glm, you can get the residuals, its similar to what @Demetri commented (y_observed - y_predicted) :
You can compare it against the outcome:
Now if we want to explore whether the residuals can be explained, I don't think you can fit a poisson glm again (might be wrong), so maybe we explore that with a regression tree:
You can see age has no more effect.. And in this dataset, unfortunately the other remaining factors have an effect. Maybe you can also check this
I am guessing your intend is to control / regress out the efficient of certain variables from your response, and fit them to another model. You can consider fitting a full model with all the covariates, regress out the so called "nuisance" parameters and fitting everything again.